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Summary 


Post-acute infection syndromes (PAIS) may develop after acute viral disease’. Infection with SARS- 
CoV-2 can result in the development of a PAIS known as “Long COVID” (LC). Individuals with 
LC frequently report unremitting fatigue, post-exertional malaise, and a variety of cognitive and 
autonomic dysfunctions” “*; however, the biological processes associated with the development and 
persistence of these symptoms are unclear. Here, 273 individuals with or without LC were enrolled 
in a cross-sectional study that included multi-dimensional immune phenotyping and unbiased 
machine learning methods to identify biological features associated with LC. Marked differences 
were noted in circulating myeloid and lymphocyte populations relative to matched controls, as well 
as evidence of exaggerated humoral responses directed against SARS-CoV-2 among participants 
with LC. Further, higher antibody responses directed against non-SARS-CoV-2 viral pathogens 
were observed among individuals with LC, particularly Epstein-Barr virus. Levels of soluble 
immune mediators and hormones varied among groups, with cortisol levels being lower among 
participants with LC. Integration of immune phenotyping data into unbiased machine learning 
models identified key features most strongly associated with LC status. Collectively, these findings 
may help guide future studies into the pathobiology of LC and aid in developing relevant 
biomarkers. 


Introduction 


Recovery from acute viral infections is heterogeneous and chronic.symptoms may linger for months to 
years in some individuals. Additionally, persistent sequelae may develop after acute infection by a 
number of viruses from a diverse range of viral families* ’. Post-acute infection syndromes (PAIS) 
microbial infections have also been described for over a century'”'’. Yet despite their ubiquity, the basic 
biology underlying PAIS development, even for extensively studied PAIS like myalgic 
encephalomyelitis/chronic fatigue syndrome (ME/CFS), remains unclear’. 


SARS-CoV-2 is a betacoronavirus responsible for.at least seven million deaths worldwide’. Infection 
causes COVID-19, which can manifest as a severe respiratory disease marked by extensive 
immunological and multi-organ system dysfunction'*’. Recovery from COVID-19 is often complete; 
however, individuals (even those with initially mild disease courses) may have significantly increased 
risks for adverse clinical events and abnormal clinical findings”’*°. 


In addition to developingsolated dysfunctions, some convalescent COVID-19 patients may develop a 
group of new onset or aggravated sequelae known as Long COVID (LC). Clinically, LC presents as a 
constellation of debilitating symptoms (e.g., unremitting fatigue, post-exertional malaise, cognitive 
impairment, and autonomic dysfunctions), alongside other less common manifestations” *. These 
persistent sequelae dramatically impair physical and cognitive function and reduce quality of life*®. 
Estimates of LC prevalence vary substantially~’, but prospective studies suggest about one in eight 
individuals with COVID-19 experience persistent somatic symptoms attributable to past SARS-CoV-2 
infection*®, While the underlying pathogenesis of LC remains unclear, current hypotheses include the 
persistence of virus or viral remnants in tissue reservoirs; development or aggravation of autoimmunity; 
microbial dysbiosis; reactivation of non-SARS-CoV-2 latent viral infections; and tissue damage caused 
by chronic inflammation. 


To interrogate the biological underpinnings of LC, a cross-sectional study was designed (Mount Sinai- 
Yale Long COVID, MY-LC) involving 273 participants comprising five study groups: (1) healthcare 
workers infected with SARS-CoV-2 before vaccination (HCW); (2) healthy, uninfected, vaccinated 
controls (healthy controls, HC); (3) previously infected, vaccinated controls without persistent symptoms 
(convalescent controls, CC); (4) individuals with persistent symptoms after acute infection (Long 


COVID, LC); and (5) a second group of individuals with persistent symptoms following acute infection 
from an independent study (External Long COVID, Ext. LC). Among the CC and LC groups, enrolled 
participants had primarily mild (non-hospitalised) acute COVID-19 and samples for this study were 
acquired, on average, more than a year after their acute infection. The HC, CC, and LC groups underwent 
systematic, multi-dimensional immunophenotyping and unbiased machine learning of aggregated data to 
identify potential LC biomarkers. 


Results 
Overview of MY-LC cohort 


The MY-LC study enrolled 183 participants (101 LC, 42 CC, and 42 HC) at one study site (Mount Sinai 
Hospital, New York City, New York) and 90 participants at another (Yale New Haven Hospital, New 
Haven, CT) for a total of 275 participants. After initial enrollment and preliminary review of electronic 
medical records, two participants were excluded from the LC group (2.0%, for pharmacologic 
immunosuppression secondary to primary immune deficiency and solid organ transplant); two from HC 
(4.8%, for pregnancy and misclassification at enrollment); and three from CC (7.1%, for pregnancy, 
monogenic disorder, and misclassification at enrollment) resulting in a final study size of 268 individuals 
(Fig. 1A). The proportion of participants excluded from the LC group did not significantly differ from 
those excluded from the other groups (Extended Data Table 1). 


Initial comparison of demographic factors showed the LC and CC groups differed in mean age (46 years, 
LC; 38 years, CC; Kruskal-Wallis post-hoc, p= 0.0040). But these groups did not significantly differ in 
sex; hospitalisation for acute COVID-19 (Fig. 1B); or median elapsed time between initial infection and 
acute disease (Fig. 1C). Most acute infections within the LC group (76%) occurred between 
epidemiological weeks 7-17 of 2020, when parental SARS-CoV-2 strains (WA-1) drove most new cases. 
Importantly, the aggregated medical history of individuals with LC did not significantly differ from that 
of CC individuals in baseline prevalence of anxiety or depression. Complete demographic features and 
medical histories are reported in Extended Data Table 1. 


Across all surveyed dimensions, participants with LC had significantly higher intensities of reported 
symptoms and dramatically worsened quality of life (Extended Data Table 2, Extended Data Fig. 1A). 
To address whether LC associated with any pattern of survey responses, responses were aggregated into a 
single classification metric (Long COVID Propensity Score, LCPS) using a parsimonious logistic 
regression model (LC vs. Other), which demonstrated significant diagnostic potential (0.94 AUC, 
bootstrap CI: 0.89—0.97) (Fig. 1D, Extended Data Fig. 1B, Extended Data Table 3). 


Among the self-reported symptoms from the LC group, fatigue (87%), brain fog (78%), memory 
difficulty (62%), and confusion (55%) were most common (Fig. 1E). Postural Orthostatic Tachycardia 
Syndrome (POTS) was also prevalent; 38% of individuals with LC had formal diagnostic testing and 
clinical evaluation (Extended Data Fig. 1C). Negative impacts on employment status were also reported 
by half the participants with LC (Extended Data Fig. 1D). 


To find groups of participants with LC with similar sets of self-reported symptoms, an agglomerative 
hierarchical clustering of binary symptoms was performed (Extended Data Fig. 1E). Three LC clusters 
were identified (bootstrapped mean cluster-wise Jaccard similarity: cluster 1, 0.75 [95% CI: 0.54-1.00]; 
cluster 2, 0.60 [0.47—0.94]; and cluster 3, 0.75 [0.56—1.00]). LC clusters were clearly bifurcated by LCPS: 
cluster 3 had intermediate propensity scores; clusters 1 and 2, more extreme ones (Extended Data Fig. 
1F). 


Circulating immune cell differences 
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Analysis of peripheral blood mononuclear cell (PBMC) populations revealed a significant difference in 
circulating immune cell populations among MY-LC cohorts. The median level of non-conventional 
monocytes (CD14"CD16") in the LC group was significantly higher than those in other groups 
(Extended Data Fig. 2A, left). To determine whether LC significantly associated with levels of non- 
conventional monocytes after accounting for demographic differences across groups, linear models were 
developed incorporating age, sex, LC status (binary), and body mass index (BMI). By this approach, LC 
significantly associated with levels of total non-conventional monocytes (Extended Data Fig. 3J) and 
those expressing MHC Class II (HLA-DR) (Extended Data Fig. 2A, right). Parallel investigation of 
absolute cell counts also revealed increased numbers of circulating non-conventional monocytes 
(Extended Data Fig. 4A). 


Systematic analysis of other immune effector populations revealed significantly lower circulating 
populations of cDC1s among participants with LC (Extended Data Fig. 2B, left; Extended Data Fig. 
4B). Linear models again found LC status and age significantly associated with circulating cDC1 levels 
(Extended Data Fig. 2B, right). Levels of other circulating granulocyte populations (neutrophils, 
eosinophils, conventional and intermediate monocytes, plasmacytoid dendritic, and cDC2 populations) 
did not significantly differ among groups, with substantial heterogeneities noted in LC (Extended Data 
Fig. 3A,B). 


The median relative percentage of B lymphocytes was significantly higher in both activated populations 
(CD86"HLA-DR": 17%, LC; 11%, CC; 12%, HC) and double-negative subsets (IgD /CD27/CD24- 
/CD38: 5%; 2%; 2%) (Extended Data Fig. 2C). The absolute count of double-negative B cells also 
significantly increased in individuals with LC (Extended Data Fig. 4C). LC status was again 
significantly associated with these effector populations in linear modeling (Extended Data Fig. 3J). 
Circulating levels of various B-cell subsets, including naive:B cells, did not significantly differ among 
groups (Extended Data Fig. 3C). 


Circulating T lymphocyte populations were not strikingly different in effector memory subsets (CD45RA~ 
/CD127-/CCR7_) (Extended Data Fig. 2D), although absolute counts of CD4* populations significantly 
increased (Extended Data Fig. 4D). The median relative percentage of circulating CD4 central memory 
cells (CD45RA7/CD127*/CCR7,) was significantly lower in the LC group (27%, LC; 33%, CC; 32%, 
HC), although groups did not differ by absolute count (Extended Data Fig. 4D). Median percentages of 
exhausted (PD-1*/Tim-3*) CD4* subsets (CD4:x) and exhausted CD8* subsets (CD8:x) did not significantly 
differ (Extended Data Fig. 2D), but absolute CD4:, counts were significantly elevated (Extended Data 
Fig. 4D). Importantly, neither naive CD4 nor CD8 T cells significantly differed (Extended Data Fig. 
3D). 


After being stimulated with phorbol myristate acetate and ionomycin, CD4" cells from individuals with 
LC produced significantly higher median levels of intracellular IL-2 (17%, LC; 14%, CC; 13%, HC); IL-4 
(11%; 7%; 8%); and IL-6 (1.7%; 1.4%; 1.5%) (Extended Data Fig. 2E, Extended Data Fig. 4E; top 
row), as did CD8" T cells (Extended Data Fig. 2E, Extended Data Fig. 4E, bottom row). Both age and 
LC status were significantly associated with intracellular IL-4 and IL-6 production (Extended Data Fig. 
2K, Extended Data Table 4). Notably, individuals with LC also had uniquely elevated median levels of 
IL-4/IL-6 double-positive CD4" T cells (0.3%, LC; 0.2%, CC; 0.2%, HC) and double-positive CD8* T 
cells (0.5%; 0.2%; 0.2%) (Extended Data Fig. 2F, Extended Data Fig. 4F). Levels of IFN-y and IL-17 
(in CD4") and TNF-a and GMZB (in CD8’) did not significantly differ across groups (Extended Data 
Fig. 3E—-I). To account for heterogeneous levels of circulating immune cell populations, permutational 
analysis of variance (PERMANOVA) was performed using effector populations with significant 
differences between groups at baseline. This multivariate analysis showed that LC status and age 
significantly predicted levels of circulating immune cell populations (Extended Data Fig. 2G). 


184 


185 
186 
187 
188 
189 
190 


191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 


215 


216 
217 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 
228 
229 
230 


SARS-CoV-2 specific antibody responses 


Initial analysis of anti-SARS-CoV-2 antibody responses was performed only for MY-LC participants who 
received two doses of vaccine. Anti-S1 IgG levels in the LC group were significantly higher than those in 
the CC group, and levels of total anti-S and anti-receptor-binding domain (RBD) IgG were elevated in 
the LC group but did not significantly differ from CC-group levels (Fig. 2A). Unvaccinated participants 
with LC had significantly higher anti-N IgG levels than did historical, unvaccinated controls previously 
exposed to SARS-CoV-2 (Extended Data Fig. 5A). 


Linear models were constructed to more fully account for baseline differences (demographics, vaccines at 
blood draw [VAD]) across cohorts (Fig. 2B, Extended Data Fig. 5B), which revealed that LC state was a 
significant, positive predictor of anti-Spike humoral response after accounting for such differences 
(Extended Data Table 5). To gauge whether the elevated responses were to distinct regions of Spike, 
anti-SARS-CoV-2 IgG responses against linear peptides were profiled among vaccinated participants. LC 
participant responses were significantly higher than CC responses against a peptide that confers increased 
neutralization”’*’, corresponding to amino acid residues 556-572 (1.3; Outlier Sum, p = 0.031). 
Responses were also higher (1.4x—1.6x) for peptides corresponding to residues 572-586, 625-638, and 
682-690 (the furin cleavage site). CC participant responses were higher than LC ones against two S2 
peptides (residues 1149-1161, 1.5; 1256-1266, 2.1x) (Fig. 2C). Multiple differentially expressed Spike- 
binding motifs were mapped onto available trimeric-structure models of Spike (PDB: 6VXX). These 
mapped to highly surface exposed sites in the protein’s natural conformational state, near the S1 RBD 
(RDPQTLE and KFLPQQ) and the S1/S2 cleavage site (RSVAS, YECDIPIGAGICA, and YMSLG) 
(Fig. 2D), consistent with participants with LC having higher anti-Spike immune responses. By analysing 
peptide enrichment for Spike motifs corresponding to Protein-based Imnmunome Wide Association Study 
(PIWAS)-identified peaks, significantly greater humoral responses against KFLPFQQ (Kruskal-Wallis, 

p = 0.023) (Fig. 2E), RDPQTLE (p = 0.00058), and LDK[WY ]F (p = 0.0034) were found (Extended 
Data Fig. 5C). Prevalences of antibody reactivities against KFLPFQQ (Fisher’s exact, p = 0.0060), 
RDPQTLE (p= 0.00015), LDK[WY]F (p = 0.00066), and DISGI (p = 0.0086) were also significantly 
higher among participants with LC than among grouped controls (Extended Data Fig. 5D). Statistical 
modeling accounting for baseline differences (demographics, VAD) revealed LC significantly associated 
with reactivity against KFLPFQQ, RDPQTLE, and DISGI motifs (Extended Data Fig. 5E), but not with 
reactivity against LDK[WY]F (Extended Data Fig. 5E), which was elevated in both CC and LC groups 
(Extended Data Fig. 5C). 


Cortisol and soluble immune mediators 


Parallel multiplex analysis,of circulating hormones and immune mediators in plasma samples revealed 
groups in the MY-LC cohort significantly differed in median levels of cortisol (Kruskal-Wallis, 

p <0.0001); complement C4b (p = 0.0001); CCL19 (p= 0.00058); galectin-1 (p = 0.0015); CCL20 

(p = 0.0032); CCL4 (p = 0.0092); APRIL (p = 0.013); LH (p = 0.022); and IL-5 (p = 0.024). Post-hoc 
comparisons showed the LC group had significantly increased complement C4b, CCL19, CCL20, 
galectin-1, CCL4, APRIL, and LH; and marginally but significantly decreased IL-5 (Extended Data Fig. 
6A-H). Additional analysis revealed significant correlations with LCPS scores, particularly for cortisol 
(Extended Data Fig. 61). In the Ext. LC cohort (n= 53, excluding an outlier whose level was >8 standard 
deviations above median), cortisol levels in the LC group were lower than those in the HC and CC groups 
(Fig. 2F). Paired levels of adrenocorticotropic hormone (ACTH) were evaluated only in the MY-LC 
cohort; these did not significantly differ across groups (Fig. 2G). Median sample collection times 
significantly differed only between CC and LC groups, and this difference was modest (65 minutes; 
Dunn’s test, p= 0.027) (Fig. 2H). Subsequent statistical modeling revealed that LC status significantly 
associated with lower cortisol levels after accounting for individual differences in age, sex, BMI, sample- 
collection time, and cohort (MY-LC vs. Ext. LC) (Fig. 21, Extended Data Table 6). 
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Autoantibodies to exoproteome 


Next, antibody reactivity against extracellular proteins was assessed in 98 LC and 38 control participants 
using rapid extracellular antigen profiling (REAP), a method to measure antibody reactivity against 
>6,000 extracellular and secreted human proteins'®. Although participants with LC had a variety of 
private reactivities against diverse autoantigens (Fig. 3A), the number of autoantibody reactivities per 
participant did not differ across groups (Fig. 3B), nor did the number of reactivities significantly correlate 
with LC clusters (as assessed by LCPS scores) (Fig. 3C). Additionally, the number of autoantibody 
reactivities correlated with neither double-negative B-cell populations nor days from acute symptom.onset 
(Extended Data Fig. 7A,B). 


Given REAP studies showing functional autoantibodies are elevated in severe acute COVID-19'%, 
autoantibody reactivities were aggregated into clusters using a manually curated Gene Ontology process 
list relevant to LC. The magnitudes of reactivity for LC and control groups did not significantly differ in 
any category (Extended Data Fig. 5C). Several reports implicated stereotypical G protein-coupled 
receptor (GPCR) autoantibodies in LC pathogenesis*'*? (e.g., targeting beta‘adrenergie receptors or the 
angiotensin II receptor). While several GPCR-directed autoantibodies were detected in this study 
(Extended Data Fig. 7D), the number of GPCR reactivities for participants with LC did not differ from 
that for controls (Fig. 3D). Importantly, individual autoantibody reactivities were not significantly more 
frequent in either participants with LC or in controls (Fig. 3E). 


Antibody responses to herpesviruses 


Given emerging evidence for the role of latent virus reactivation in LC, three complementary approaches 
were used to examine anti-viral reactivity patterns inthe MY-LC cohorts: REAP, serum epitope 
repertoire analysis (SERA), and ELISA. Global anti-viral responses were first assessed by REAP, which 
measures antibody reactivity to 225 viral surface proteins (Supplementary Table 2). Reactivities against 
38 viral conformational epitopes were detected among 98 LC and 38 control participants (Extended Data 
Fig. 8A). For SARS-CoV2 reactivities, only participants who received two doses of vaccine were 
analysed. Reactivities against non—Omicron variant RBDs in the LC cohort were higher than those in the 
CC controls (Fig. 4A); however; as with ELISA, this trend was not significant. 


Differences in viral reactivities against non-SARS-CoV-2 antigens were striking (Fig. 4B). Participants 
with LC had elevated REAP scores for several herpesvirus antigens, including the Epstein-Barr virus 
(EBV) minor viral capsid antigen gp23 (p = 4.62E-3); the EBV fusion-receptor component gp42 

(p =3.2E-2); and the VZV glycoprotein E (p = 1.51E-2) (Extended Data Fig. 8B). Conversely, 
participants with LC had lower REAP scores for HSV-1 glycoprotein gL (p= 4.61E-6) and gD1, although 
the difference in gD1 reactivity was not significant. 


Next; the SERA platform (a commercially available random bacterial display library with unlimited 
multiplex capability) was used to orthogonally analyse non-SARS-CoV-2 antigens. SERA includes 
epitope panels representing 45 pathogens and disease markers, validated using a database of thousands of 
controls. Importantly, SERA revealed that cohorts significantly differed neither in estimated EBV 
seroprevalence (Fig. 4C) nor for any other tested viral pathogen (Extended Data Fig. 8C). 


First was assessed whether individuals with LC had higher EBV reactivities because of acute EBV 
infection. Anti-EBV IgM was not elevated in this group (as measured by SERA) (Extended Data Fig. 
8D) nor was there evidence of EBV viremia (Extended Data Fig. 8E,F), suggesting that the higher 
reactivity to EBV lytic antigens was more likely caused by recent EBV reactivation than by acute 
infection. Additionally, these results do not rule out EBV shedding at a local site, such as in saliva™*. 
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Next was assessed whether differences in baseline seropositivity affected EBV-antigen reactivity. EBV 
reactivity was analysed only in EBV-seropositive individuals as identified by SERA and by Identifying 
Motifs Using Next-generation sequencing Experiments (IMUNE). By REAP, seropositive participants 
with LC had significantly higher reactivity to EBV p23 (Kruskal-Wallis, p = 0.00095, Fig. 4D) and gp42 
(0.0039, Fig. 4E) than did seropositive controls. REAP measurements significantly correlated with 
ELISA measurements (R = 0.73, p $ 2.2E-16), orthogonally validating this finding (Extended Data Fig. 
8G). In an orthogonal screen of linear peptides with SERA, the LC cohort had greater reactivity against 
the gp42 linear peptide (PV XF[ND]K) (Kruskal-Wallis, p = 0.0031) (Fig. 4F). Mapping of this motif onto 
available structures of gp42 complexed with EBV gH/gL (PDB: 5T1D) showed these residues are 
exposed on the surface of EBV virions (Fig. 4G, pink residues). 


To investigate lower REAP reactivity to HSV-1 antigens observed in participants with LC, a. similar 
analysis was performed using only HSV-1 seropositive individuals, as identified by SERA. In these 
individuals, REAP scores for HSV-1 glycoprotein gD1 no longer differed among groups (Extended Data 
Fig. 8H). Post-hoc comparisons for HSV-1 gL also showed the groups did not significantly differ 
(Extended Data Fig. 81). These data suggested that the lower IgG reactivity to.gL in REAP (Fig. 4B) is 
probably caused by lower HSV-1 seroprevalence in the LC group. In aggregated initial REAP and SERA 
results, individuals with LC had elevated IgG reactivity to EBV and VZV surface antigens without 
evidence of EBV primary infection or acute viremia. 


Additional analysis revealed LCPS significantly correlated with humoral reactivity against neither gp42 
PVXF[ND]K nor EBV p23 antigens in EBV-seropositive individuals (Extended Data Fig. 8J,K). In 
contrast, reactivity to gp42 PVXF[ND]K correlated with IL-4/IL-6 producing CD4" T cells in EBV- 
seropositive individuals with LC (R = 0.26, p = 0.013) (Fig. 4H). This correlation was not observed in 
control groups. Furthermore, EBV p23 REAP reactivity significantly correlated with terminally 
differentiated effector memory (Timea) CD4" T cells (R = 0.26, p = 0.018) (Fig. 41), a subset of cells 
implicated in protection from CMV*. In contrasty anti-SARS-CoV-2 antibody levels did not correlate 
with IL-4/IL-6 double-positive CD4* T.célls (Extended Data Fig. 8L-O). 


Unique biological markers of Long COVID 


To further account for demographic differences among groups that might affect immunophenotypes, each 
LC participant was explicitly matched to a control participant by using a Gale-Shapley procedure based 
on participant age, sex, days from acute COVID-19 symptom onset, and vaccination status. Participants 
with LC did not differ significantly from controls in these criteria (Extended Data Fig. 9A), nor in 
severity of acute COVID-19 disease (whether hospitalisation was required) (Extended Data Fig. 9B). 
Principal component analysis embedding of matched participants with all collected immunological 
features clearly distinguished individuals with LC from controls (Fig. 5A). Consistent with this, k-nearest 
neighbours classification on the normalised features efficiently discriminated between groups, with an 
AUC of 0.94 (95% CI: 0.84-1.00) (Fig. 5B). Principal components regression of collated immunological 
data showed that flow cytometry (pseudo-R? = 59%) and plasma proteomics and hormones (pseudo- 

R? = 74%) were most informative for separating groups. A final parsimonious LASSO model similarly 
achieved good fit (pseudo-R? = 82%) (Fig. 5C). Of the features selected for the final model, several 
associated positively with LC status (serum galectin-1 concentration, IgG against various EBV epitopes); 
while others associated negatively (serum cortisol; PD-1* / CD4* Tem; cDC1 cells) (Fig. 5D). Preliminary 
external validation in the Ext. LC cohort of selected LASSO-model features revealed similar decreases in 
cortisol, but galectin-1 and EBV gp42 predicted LC status specifically in the MY-LC cohort (Extended 
Data Fig. 9C,D), potentially caused by clinical phenotype differences between the MY-LC and Ext. LC 
cohorts (Extended Data Fig. 9E). 


319 
320 
321 
322 
323 
324 
325 


326 


327 
328 
329 
330 
331 
332 
333 
334 
335 
336 
337 


338 
339 
340 
341 
342 
343 


344 
345 
346 
347 
348 
349 
350 
351 
352 
353 
354 
355 
356 
357 


358 
359 
360 
361 
362 
363 
364 
365 


Serum cortisol was the most significant predictor of LC status in the model, and cortisol alone achieved 
an AUC of 0.96 (95% CI: 0.93—0.99) (Extended Data Fig. 9F, top). Notably, serum cortisol in the MY- 
LC cohort was similar in the HC and CC controls, and lower in participants with LC (Extended Data 
Fig. 9F, bottom). When used alone, each of the other selected model features predicted status reasonably 
well (Extended Data Fig 9G,H). Last, classification accuracies of LCPS models substantially agreed 
with machine learning ones (Cohen’s Kappa, 0.79; 95% CI: 0.65—0.93), suggesting that both participant- 
reported outcomes and immunological features efficiently predict LC status (Extended Data Table 7). 


Discussion 


Studies of individuals with LC reported diverse changes in immune and inflammatory factors***’. In this 
study, exploratory analyses identified significant immunological differences between the LC population 
and demographically matched control populations more than a year from their acute infections: 
Circulating immune cell populations significantly changed. Populations of non-conventional monocytes, 
double-negative B cells, and IL-4/IL-6 secreting CD4 T cells increased; and those of conventional DC1 
and central memory CD4 T cells decreased. In addition, individuals with LC had higher levels of 
antibodies to SARS-CoV-2, EBV, and VZV antigens. In contrast, levels of individual autoantibodies to 
human exoproteome did not significantly differ. Marked differences in levels of circulating cytokines and 
hormones, particularly cortisol, were noted in participants with LC from both MY-LC and Ext. LC 
cohorts. Unbiased machine learning revealed several core predictive features of LC status within the MY- 
LC study, identifying potential targets for additional validation and future biomarker development. 


Multiple hypotheses have been proposed for LC pathogenesis, including persistent virus or viral 
remnants**, autoimmunity, dysbiosis, latent viral reactivation, and unrepaired tissue damage. The data in 
this study suggest persistent SARS-CoV-2 viral antigens, reactivation of latent herpesviruses, and chronic 
inflammation may all contribute to LC. Overall, our data are’ less consistent with an autoantibody- 
dominated disease process in LCs. Whether autoreactive T cells play a role in LC pathogenesis was not 
addressed and requires future investigation. 


Immune phenotyping of PBMC populations revealed participants with LC had notably higher levels of 
circulating non-conventional monocytes associated with various chronic inflammatory and autoimmune 
conditions*’. These participants also had significantly lower levels of circulating cDC1 populations, 
which are responsible for antigen presentation and cytotoxic T cell priming”. In addition, the number of 
CD4" Tem cells was significantly reduced and the absolute number of exhausted CD4* T cells was 
increased. Cerebral spinal fluid from individuals with LC also have elevated levels of TIGIT’ CD8* T 
cells, consistent with possibleimmune exhaustion*'. After stimulation, T cells from individuals with LC 
produced significantly more intracellular IL-2 (CD4, CD8), IL-4 (CD4), and IL-6 (CD8). Notably, subsets 
of participants with LG@.also had polyfunctional IL-4/IL-6 co-expressing CD4" T cells, which correlated 
with reactivity against EBV lytic antigens, but not against SARS-CoV-2 antigens. In aggregate, these 
findings may be consistent with Ty2-skewed CD4 T cell activation in response to EBV among 
participants with LC, as suggested for ME/CFS”. Levels of IgG against SARS-CoV-2 Spike and S1 in 
participants with LC were higher than those in vaccination-matched controls, consistent with persistent 
viral antigens“? 


Participants with LC from two sites had significantly decreased systemic cortisol levels; this remained 
significant after accounting for variations in demographics and sample-collection times. Interestingly, the 
decreased cortisol did not associate with a compensatory increase in ACTH levels, suggesting the 
hypothalamic-pituitary axis response to regulate cortisol may be inappropriately blunted. Importantly, 
ACTH has an extremely short half-life in plasma, which may impair accurately detecting changes. 
Dedicated studies must confirm these preliminary findings. Intriguingly, an earlier study of 61 survivors 
of SARS-CoV infection showed similar evidence of hypocortisolemia and blunted ACTH responses three 
months after acute disease*®. Furthermore, decreased cortisol levels during the early phases of COVID-19 
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were associated with development of respiratory LC symptoms”’. As cortisol is central for a variety of 
homeostatic and stress responses“, the current finding of persistently lower cortisol levels in those with 
LC more than a year after acute infection warrants further investigations. 


This study also revealed individuals with LC have elevated antibody responses against non-SARS-CoV-2 
viral antigens, particularly EBV antigens. EBV viremia occurs during acute COVID-19 in hospitalised 
patients and predicts development of persistent symptoms in the post-acute period*’. Other studies 
implicated recent EBV reactivation with LC development*”””. The observation here of elevated IgG 
against EBV lytic antigens suggests that recent reactivation of latent herpesviruses (EBV, VZV) may be a 
common feature of LC. 


Finally, machine learning models designed to accurately classify LC and control populations (after 
matching individuals to account for potentially confounding features, like sex, age, days from symptom 
onset, and vaccination status) identified multiple features that significantly predict LG. status. 
Classifications using only immunological data strongly agreed with classifications using survey scores 
(LCPS; Cohen’s Kappa, 0.764), showing the immunological analyses and patient-reported outcomes used 
here were highly concordant in diagnosing LC. 


This study has several limitations. Primary among these is that few participants were identified by 
convenience sampling and that recruitment strategies for cases differed.from those for controls. Though 
broadly covering diverse biological features, this study used far fewer independent observations than 
traditional machine learning studies use (several thousands) to robustly train and optimise classification 
models. This study was also restricted to analysing peripheral (circulating) immune factors from 
participants. As LC often presents with organ system—specific dysfunctions, greater analyses of local 
immune features would crucially extend these findings» Further, analysis of autoantibodies was restricted 
to the exoproteome. Whether autoantibodies to intracellular antigens or non-protein antigens participate in 
LC pathogenesis was not tested. 


In summary, significant biological differences were identified between participants with LC and 
demographically and medically matched:€C and HC participants, validating extensive reports of 
persistent symptoms by various individuals with LC and patient advocacy groups. This study provides a 
basis for future investigations into,the immunological underpinnings driving the genesis of LC. 
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Main Figure Legends 


Figure 1. Demographic and clinical stratification of participants with Long COVID. (A) Schematic of 
MY-LC study. Numbers indicate participants after exclusion (see ‘Methods ’). (B) Select demographics for 
LC (top.row, purple) and CC (bottom row, yellow) groups. Centre values in ‘Age’ column represent average 
group values (n= 40 CC, 99 LC). Statistical significance is reported for relevant post-hoc comparisons 
(‘Age’) or Chi-square tests (‘Sex’ and ‘Acute Disease Severity’). Complete statistical results are detailed 
in Extended Data Table 2. (C) Box plots of days from acute symptom onset between LC and CC groups. 
Significance was assessed using a two-tailed Brown-Mood median test with an alpha of 0.05 (n= 39 CC, 
99 LC). (D) Box plots of LCPS for each individual (n = 35 HC, 20 CC, 98 LC). Significance was assessed 
using Kruskal-Wallis tests corrected for multiple comparisons using Bonferroni’s method. (E) Prevalence 
of top 30 self-reported binary symptoms ranked from most prevalent (right) to least prevalent (left). 
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Symptoms are coloured according to common physiological system: Constitutional (Const., light green); 
Neurological (Neuro., blue); Pulmonary (Pulm., gold); Musculoskeletal (MSK, red); Gastrointestinal (GI, 
purple); Cardiac (dark green); Endocrine (Endo., pink); Ear, Nose, Throat (ENT, grey); and Sexual 
Dysfunction (Sex Dys., teal). For all box plots (C,D), central lines indicate group medians; top and bottom 
lines indicate 75" and 25" percentiles, respectively; whiskers represent 1.5 IQR; and individual data points 
mark outliers. Abbreviations: IOR, interquartile range; Long COVID propensity scores, LCPS. For (A), 
clockwise, from top left: HCW, historical, unvaccinated SARS-CoV-2 exposed controls; Ext. LC, external 
participants with Long COVID; CC, convalescent infected individuals without persistent symptoms; LC, 
convalescent infected individuals with persistent symptoms; HC, healthy controls with no prior exposure. 
For (E), top to bottom: EMR, electronic medical record; n.s., not significant; Dif., difficulty; UI, urination; 
Subj., subjective; Alt., altered; Decr., decreased; Abd., abdominal; reg., regulating; temp, »body 
temperature; Musc, muscle; Indig., indigestion. 


Figure 2. Exaggerated SARS-CoV-2 specific humoral responses and altered circulating immune 
mediators among Long COVID participants. (A) SARS-CoV-2 antibody responses-assessed by ELISA 
(n= 22 HC, 17 CC, 70 LC). Vaccination status for each cohort is indicated by “x2” indicating the number 
of SARS-CoV-2 vaccine doses at sample collection. Significance for difference in group medians was 
assessed using Kruskal-Wallis with FDR (Benjamini-Hochberg) for multiple comparison. Central lines 
indicate group median values; whiskers, 95% CI estimates. (B) Coefficients from linear models are reported 
for various outcomes. Model predictors are indicated on x-axis. Significant predictors (p< 0.05) are in 
purple. Detailed model results are reported in Extended Data Table 5. (C) PIWAS line profiles of IgG 
binding within participants with >1 vaccine dose plotted along SARS-CoV-2 Spike amino acid sequence. 
Various Spike protein domains are indicated by coloured boxes (top). 95" percentile values are arranged 
by group: LC (purple, n= 80), HC (orange, n= 39) and CC (yellow, n= 38) with peaks >2.5 PIWAS value 
annotated by their consensus linear motif sequence (bold):and surrounding residues. Significantly enriched 
peaks in LC group are marked (*), as calculated:by Outlier Sum statistics. (D) Three-dimensional mapping 
of LC-enriched motif sequences onto trimeric Spike'protein. (S1, light grey; NTD, light blue; RBD, red; 
and S2, dark grey, with various LC-enriched motifs annotated.) (E) Box plots of z-score enrichments for 
IgG binding to Spike sequence KFLPFQQ amongst participants who have received >1 vaccine doses. A z- 
score >3 indicates significant binding relative to control populations. Box plots of z-score transformed 
cortisol (F) ACTH (G), and sample-collection times (H) by group. Participants with potentially 
confounding medical comorbidities (e.g., pre-existing pituitary adenoma, adrenal insufficiency, oral steroid 
use) were removed before analysis. (1) Coefficients from linear models of cortisol levels. Significant 
predictors (p < 0.05) are.in purple. Detailed model results are reported in Extended Data Table 6. Box 
plots (E—H) are represented as.in Fig. 1. Significance for difference in group medians was assessed using 
Kruskal-Wallis with Bonferroni’s correction for multiple comparison. Abbreviations: ACTH, 
adrenocorticotropic-hormone; NTD, N terminal domain; O.S., Outlier Sum; PIWAS, Protein-based Im- 
munome Wide Association Study; RBD, receptor-binding domain; SP, signal peptide; VAD, vaccines at 
sample draw. 


Figure 3. Long COVID participants showed limited but selective autoantibodies against the human 
exoproteome. (A) Heatmap depicting REAP reactivities across the MY-LC cohort (n= 25 HC, 13 CC, 98 
LC). Each column is one participant, grouped by cohort (for HC and CC) or by LCPS (for LC). Column 
clustering within groups was performed by k-means clustering. Each row represents one protein. Proteins 
were grouped using Human Protein Atlas mRNA expression data for different tissues. Reactivities depicted 
have at least one participant with a REAP score >3. (B) The number of autoantibody reactivities per 
individual by group. Significance assessed by Kruskal-Wallis. Box plots are represented as in Fig. 1. (C) 
Correlation plot depicting the relationship between number of autoantibody reactivities per individual and 
LCPS. Correlation was assessed by Spearman. Black line depicts linear regression with 95% CI shaded. 
Colours depict LC LCPS group (cluster 1, red; cluster 2, green; cluster 3, blue). Each dot represents one 
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individual. (D) Violin plot depicting the number of GPCR autoantibodies per individual. Significance 
assessed by Kruskal-Wallis. Each dot represents one individual. (E) Assessment of the frequency of 
individual autoantibody reactivities in participants with LC and controls. Significance assessed by Fisher’s 
exact test. Y-axis depicts the negative logio of unadjusted p-value, with the Bonferroni adjusted significance 
threshold depicted by the black dashed line. X-axis depicts the difference in the proportion of autoantibody 
positive individuals in each group. Each dot represents one autoantibody reactivity. Abbreviations: CNS, 
central nervous system; GPCR, G-protein coupled receptor; Pit., pituitary. 


Figure 4. Long COVID participants demonstrate elevated levels of antibody responses to 
herpesviruses. (A) Violin plots depicting the REAP score distributions for SARS-CoV2 S1 RBD between 
LC (n= 69) and CC participants (n = 10) with two doses of mRNA vaccine. Statistical significance assessed 
by Wilcoxon rank-sum adjusted for multiple comparisons by Benjamini-Hochberg method. (B) Violin plots 
depicting the REAP score distributions for a given viral antigen between LC (n= 25:HC, 13 CC, 98 LC). 
Statistical significance assessed by Wilcoxon rank-sum adjusted for multiple comparisons by Benjamini- 
Hochberg method. Only antigens with >2 LC and =2 control individuals with reactivity were included. (C) 
Seropositivity as assessed by SERA for EBV amongst LC (n=99) and control participants (n= 78). 
Significance assessed by Fisher’s exact test adjusted for multiple comparisons by Benjamini-Hochberg 
method. (D,E) REAP scores amongst EBV-seropositive individuals only for EBV p23 (D) and gp42 (E) by 
group (n= 25 HC, 13 CC, 98 LC). (F) SERA-derived z-scores for gp42 motif PVXF[ND]K amongst EBV- 
seropositive individuals only plotted by group. Dashed line represents z-score threshold for epitope 
positivity defined by SERA (n=39 HC, 38 CC, and 80. LC).(G):Three-dimensional mapping of LC- 
enriched linear peptide sequence PVXF[ND]K (magenta) onto EBV gp42 (purple) in complex with gH 
(light grey) and gL (dark grey) (PDB: 5T1D). (H) Correlation plot depicting the relationship between EBV 
gp42 PVXF[ND]K z-score and percent IL-4/IL-6 CD4"T cells (of total CD4" T cells) for participants. Only 
EBV-seropositive individuals were included. Correlation assessed by Spearman. Black line depicts linear 
regression with 95% CI shaded. (n=39 HC, 38 CC, and 80 LC). (D Correlation plot depicting the 
relationship between EBV p23 REAP score and percent CD4* TEMRA (of total CD3* T cells). Only EBV- 
seropositive individuals were included. Correlation assessed by Spearman. Black line depicts linear 
regression with 95% CI shaded. Colours\depict LCPS Clusters as in Fig. 3. Box plots are represented as in 
Fig. 1. Statistical significance. of difference in medians determined by Kruskal-Wallis. Post-hoc tests 
performed using Dunn’s test with Holm’s method to adjust for multiple comparisons. Abbreviation: EBV, 
Epstein-Barr virus; REAP, rapid.extracellular antigen profiling; SERA, serum epitope repertoire analysis; 
TM, transmembrane domain. 


Figure 5. Biochemical factors differentiate participants with Long COVID from matched controls. 
All data shown represent a matched subset of participants (n= 41 HC, 40 CC, 81 LC) selected by a Gale- 
Shapley procedure on demographic factors (Extended Data Fig. 9A). (A) PCA projection of participant 
data comprising cytokine, flow cytometry, and various antibody responses (a-SARS-CoV-2, non-SARS- 
CoV-2 viral antibodies, and aAb). Marginal histograms display data density along each principal 
component dimension. (B) ROC curve analysis from unsupervised KNN classification. AUC and 95% CI 
intervals (DeLong’s Method) are reported. (C) McFadden’s pseudo R-squared are reported as bar plot for 
each data segment. An integrated, parsimonious McFadden’s pseudo R-squared is reported for the final 
classification model (‘All’). (D) LASSO regression identifies a minimal set of immunologic features 
differentiating participants with LC from others. Unlabeled dots are significant predictive features not 
included in the final LASSO regression model. Dots are coloured according to individual data segments: 
orange, Flow; blue, plasma cytokines; pink, viral epitopes; green, a-SARS-CoV-2; yellow, aAb. 
Abbreviations: aAb, autoantibodies; o-SARS-CoV-2, anti-SARS-CoV-2 antibodies; AUC, area under the 
curve, CI, confidence interval; Flow, flow cytometry; FPR, false positive rate; KNN, k-nearest neighbours; 
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LC, Long COVID,; PCA, principal component analysis; ROC, receiver operating characteristics; TPR, true 
positive rate. 


Methods 
Ethics Statement 


This study was approved by the Mount Sinai Program for the Protection of Human Subjects (IRB #20- 
01758) and Yale Institutional Review Board (IRB #2000029451 for MY-LC; IRB #2000028924 for 
enrollment of pre-vaccinated Healthy Controls; HIC #2000026109 for External Long COVID). Informed 
consent was obtained from all enrolled participants. 


MY-LC Study Design, Enrollment Strategy, and Inclusion / Exclusion Criteria 


MY-LC was a cross-sectional, multi-site study comprised of five different groups with differing SARS- 
COV-2 exposure histories and varied Long COVID status. Participants enrolled in the Long COVID 
group underwent complete medical evaluations by physicians to rule out alternative medical etiologies for 
their persistent symptoms before study enrollment. 


Participants with persistent symptoms following acute COVID-19 were recruited from Long COVID 
clinics within the Mount Sinai Healthcare System and the Center for,Post COVID Care at Mount Sinai 
Hospital. Participants enrolled in healthy and convalescent study arms were recruited via IRB-approved 
advertisements delivered through email lists, study flyers located in hospital public spaces, and on social 
media platforms. Informed consent was provided by all participants at the time of enrollment. All 
participants provided peripheral blood samples and completed symptom surveys the day of sample 
collection (described below). Self-reported medical histories for all MY-LC participants were also 
collected at study visits and further reviewed through examination of electronic medical records by 
collaborating clinicians. 


Inclusion criteria for individuals in the Long COVID group (“LC”) were age > 18 years; previous 
confirmed or probable COVID-19 infection (according to World Health Organization guidelines®'); and 
persistent symptoms > 6 weeks following initial COVID-19 infection. Inclusion criteria for enrollment of 
individuals in the healthy control group (“HC”) were age = 18 years, no prior COVID-19 infection, and 
completion of a brief, semi-structured verbal screening with research staff confirming no active 
symptomatology. Inclusion criteria for individuals in the convalescent control group (“CC”) were age = 
18 years; previous confirmed or probable prior COVID-19 infection; and completion of a brief, semi- 
structured verbal screening with research staff confirming no active symptomatology. 


Pre-specified exclusion criteria for this study were inability to provide informed consent; and any 
condition preventing,a blood test from being performed. Additionally, all participants had electronic 
health records reviewed by study clinicians following enrollment and were subsequently excluded prior to 
analyses for the following reasons: (1) current pregnancy, (2) immunosuppression equivalent to or 
exceeding prednisone 5 mg daily, (3) active malignancy or chemotherapy, and (4) any monogenic 
disorders. For specific immunological analyses, pre-existing medical conditions were additionally 
excluded prior to analyses due to high potential for confounding (e.g., participants with hypothyroidism 
were excluded prior to analysis of circulating T3/T4 levels; participants with pituitary adenomas were 
excluded prior to analysis of cortisol levels). Specific exclusions are marked by “A” in figures and 
detailed in relevant legends. 


The recruitment of individuals in healthcare worker group (HCW) is described at length elsewhere”. 


Individuals in the external Long COVID group (Ext. LC) were identified from The Winchester Center for 
Lung Disease's Post-COVID-19 Recovery Program at Yale New Haven Hospital by collaborating 
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clinicians. All participants underwent medical evaluation for persistent symptoms following COVID-19. 
Participants from this group were identified retrospectively for inclusion in the MY-LC study according 
to the established MY-LC protocol: age = 18 years; previous confirmed or probable COVID-19 infection 
(according to World Health Organization guidelines*’); and persistent symptoms > 6 weeks following 
initial COVID-19 infection. 


Participant Surveys 


A comprehensive suite of surveys was administered to MY-LC study participants, combining validated 
patient-reported outcomes (PROs) with custom, purpose-developed tools by the MY-LC study team. 
Baseline demographic data collected from surveys included gender, age, body mass index (BMI), race, 
and medical comorbidities. Additionally, participants in the Long COVID and convalescent groups were 
asked to provide COVID-19 clinical data including date of symptom onset and acute disease severity 
(non-hospitalized vs. hospitalized), any SARS-CoV-2 polymerase chain reaction (PCR) diagnostic testing 
results, and any SARS-CoV-2 antibody testing results. Finally, all participants were.asked to report 
SARS-CoV-2 vaccination status including date of vaccinations and vaccine brand. 

At the time of blood collection, all participants completed PROs for fatigue (Fatigue Severity Scale (FSS) 
3, fatigue visual analogue scale [F-VAS]), post-exertional malaise (DePaul Symptom Questionnaire Post- 
Exertional Malaise Short Form [DSQ-PEM Short Form])**, breathlessness (Medical Research Council 
[MRC] Breathlessness Scale‘), cognitive function (Neuro-QOL v2.0 Cognitive Function Short Form**), 
health-related quality of life (HRQoL) (EuroQol EQ-5D-5L”), anxiety (GAD-7*), depression (PHQ-2*), 
pain (P-VAS), sleep (Single-Item Sleep Quality Scale”), as wellhas pre- and post-COVID-19 employment 
status (author-developed). Lastly, participants in the MY-LC study were asked to self-report any current 
persistent symptoms from a study-provided list. 


All survey data were collected and securely stored using REDCap*'* (Research Electronic Data Capture) 
electronic data capture tools hosted within the Mount:Sinai Health System. 


Long COVID Propensity Score (LCPS) 


Calculation of propensity scores for each participant was achieved through construction of a multivariable 
logistic regression model generated. with Long COVID vs. “Others” (Healthy Controls + Convalescent 
controls) as the outcome. The model candidate variables included survey responses from the following 
instruments described previously: FSS, F-VAS, DSQ-PEM Short Form, MRC Breathlessness Scale, 
Neuro-QOL v2.0 Cognitive Function Short Form, EQ-5D-5L, GAD-7, PHQ-2, P-VAS, Single-Item Sleep 
Quality Scale. Modelselection using Akaike’s Information Criteria (AIC) was used to select the final, 
parsimonious model. Odds ratios from the final model were normalized by dividing them by their 
respective standard error (SE) and rounding off to the nearest integer. These integer values were 
considered the score items for these specific variables and a cumulative prediction score for each subject 
was calculated by summation (Equation 1, below). As the score did not significantly differ between 
healthy controls and convalescent controls, the two control groups were combined as a single group 
(“Others”) for final analysis. A ROC curve analysis was performed to identify the optimal cutoff for the 
LCPS using the maximum value of Youden’s index J for Long COVID vs Others. A 10-fold cross- 
validation was used for internal validation and to obtain 95% confidence interval (CI) for the area under 
the curve (AUC). Data were analyzed using Stata version 16 (StataCorp, College Station, Texas). 


Equation 1: LCPS = 7+) GAD+1*) MRC+2*) PHQ2 +3) EQ5 +28» ) FSS 


Blood Sample Processing 
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Whole blood was collected in sodium-heparin-coated vacutainers (BD 367874, BD Biosciences) from 
participants at Mount Sinai Hospital in New York City, New York. Following blood draw, all participant 
samples were assigned unique MY-LC study identifiers and de-identified by clinical staff. Samples were 
couriered directly to Yale University in New Haven, CT the same day as the sample collection. Blood 
samples were processed on the same day as collection. Plasma samples were collected after centrifugation 
of whole blood at 600*g for 10 minutes at room temperature (RT) without brake. Plasma was then 
transferred to 15-mL polypropylene conical tubes, aliquoted, and stored at —80 °C. The peripheral blood 
mononuclear cell (PBMC) layer was isolated according to the manufacturer’s instructions using SepMate 
tubes (StemCell). Cells were washed twice with phosphate-buffered saline (PBS) before counting: 
Pelleted cells were briefly treated with ACK lysis buffer (ThermoFisher) for 2 minutes and then counted. 
Viability was estimated using standard Trypan blue staining and a Countess II automated cell counter 
(ThermoFisher). PBMCs were stored at —80 °C for cryopreservation or plated directly for flow cytometry 
studies. Plasma samples from the External Long COVID group were obtained using BD Vacutainer CPT 
tubes (#362753) according to manufacturer’s instructions and stored in aliquots at—80 °C prior to 
analysis. 


Flow cytometry 


Freshly isolated PBMCs were plated at 1—2 x 10° cells per well in a 96-well U-bottom plate. Cells were 
resuspended in Live/Dead Fixable Aqua (ThermoFisher) for 20 min.at 4°C. Cells were washed with PBS 
and followed by Human TruStain FcX (BioLegend) incubation for 10 min at RT. Cocktails of staining 
antibodies were added directly to this mixture for 30 minutes at RT. Prior to analysis, cells were washed 
and resuspended in 100 pl 4% PFA for 30 min at 4 °C. For intracellular cytokine staining following 
stimulation, the surface marker-stained cells were resuspended in 200 ul cRPMI (RPMI-1640 
supplemented with 10% FBS, 2 mM L-glutamine, 100 U/ml penicillin, and 100 mg/ml streptomycin, | 
mM sodium pyruvate) and stored at 4 °C overnight. Subsequently, these cells were washed and stimulated 
with 1x Cell Stimulation Cocktail (eBioscience)in 200.1 cRPMI for 1 h at 37 °C. 50 wl of 5x 
Stimulation Cocktail in cRPMI (plus protein transport 442 inhibitor, eBioscience) was added for an 
additional 4 hours of incubation at 37 °C. Following stimulation, cells were washed and resuspended in 
100 ul 4% paraformaldehyde for 30 min at 4°C. To quantify intracellular cytokines, cells were 
permeabilized with 1x permeabilization buffer from the FOXP3/Transcription Factor Staining Buffer Set 
(eBioscience) for 10 min at 4 °C, All subsequent staining cocktails were made in this buffer. 
Permeabilized cells were then washed and resuspended in a cocktail containing Human TruStain FcX 
(BioLegend) for 10 min at 4 °C, Finally, intracellular staining cocktails were added directly to each 
sample for 1 h at 4°C. Following this incubation, cells were washed and prepared for analysis on an 
Attune NXT (ThermoFisher). Data were analyzed using FlowJo software version 10.8 software (BD). 
Antibody information can be seen in Supplementary Table 1. 


A PERMANOVA test was used to assess the relationship between all circulating immune cell populations 
presented in Extended Data Fig. 2 and participant age, sex, Long COVID status, as well as BMI. The 
PERMANOVAbtest was run using the “VEGAN” package in R®. 


SARS-CoV-2 antibody testing by ELISA 


ELISAs were performed as previously described!>. Briefly, Triton X-100 and RNase A were added to 
plasma samples at final concentrations of 0.5% and 0.5 mg/ml, respectively, and incubated at room 
temperature for 30 minutes prior to use to reduce risk from any potential virus in plasma. MaxiSorp plates 
(96 wells; 442404, Thermo Scientific) were coated with 50 ul per well of recombinant SARS-CoV-2 
Total S (SPN-C52H9-100 pg, ACROBiosystems), RBD (SPD-C52H3-100 tng, ACROBiosystems) and the 
nucleocapsid protein (NUN-C5227-100 ug, ACROBiosystems) at a concentration of 2 g/ml in PBS and 
were incubated overnight at 4 °C. The coating was removed, and plates were incubated for 1 hour at room 
temperature with 200 ul of blocking solution (PBS with 0.1% Tween-20 and 3% milk powder). Plasma 
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was diluted serially at 1:100, 1:200, 1:400 and 1:800 in dilution solution (PBS with 0.1% Tween-20 and 
1% milk powder), and 100 ul of diluted serum was added for 2 hours at room temperature. Human anti- 
spike (SARS-CoV-2 human anti-spike [AM006415; 91351, Active Motif] and anti-nucleocapsid SARS- 
CoV-2 human anti-nucleocapsid (1A6; MA5-35941, Active Motif) were serially diluted to generate a 
standard curve. Plates were washed three times with PBS-Tween (PBS with 0.1% Tween-20) and 50 ul of 
HRP anti-human IgG antibody (1:5,000; A00166, GenScript) added to each well in dilution solution. 
After 1 hour of incubation at room temperature, plates were washed six times with PBS-Tween. Plates 
were developed with 100 ul of TMB Substrate Reagent Set (555214, BD Biosciences) and the reaction 
was stopped after 5 min by the addition of 2N sulfuric acid. Plates were then read at an 
excitation/emission wavelength of 450 nm and 570 nm. 


Multiplex proteomic analysis 


Participant plasma was isolated and stored at -80 °C as described above. Plasma was:shipped to Eve 
Technologies (Calgary, Alberta, Canada) on dry ice and analytes were measured using the following 
panels: Human Cytokine/Chemokine 71-plex Discovery Assay (HD71), Steroid/Thyroid 6plex Discovery 
Assay (STTHD), TGF-Beta 3-plex Discovery Assay (TGFB1-3), Human Myokine Assay (HMYOMAG- 
10), Human Neuropeptide Assay (HNPMAG-05), Human Pituitary Assay (HPTP1L), Human Cytokine P3 
Assay (HCYP3-07), Human Cytokine Panel 4 Assay (HCYP4-19), Human Adipokine Panel 2 Assay 
(HADK2-03), Human Cardiovascular Disease Panel Assay (HDCVD9); Human CVD2 Assay (HCVD2- 
8), Human Complement Panel Assay (HDCMP1), Human Adipokine Assay (HDADKS). Analysis of 
plasma proteomics was completed in two batches with internal. controls in each shipment to assess for and 
correct any analyte batch effects (described below) 


To integrate analytes across batches, two samples fromthe same representative individuals from each 
group (2 from LC, 2 from CC, and 2 from HC) were measured in each analysis batch. The median 
difference between all paired samples between the first and second batch was used as an additive 
corrective factor to integrate samples across batches»After batch integration, each feature was z-scored 
using all measurements across both batches. Following z-scoring, features that were found to have 
persistent batch effects, as defined by a Wilcoxon tank sum test p< 0.05 post-correction, were not 
considered for downstream analysis. 


Real-time Taqman assay for detection of EBV DNA 
Nucleic Acid Extraction 


Nucleic acid was extracted from 200uL freeze-thawed serum using the MagMAX Viral/Pathogen Nucleic 
Acid Isolation Kit (ThermoFisher, #A42352), automated on the KingFisher Flex (Thermo Fisher 
Scientific, Waltham, MA, USA) per manufacturer’s protocol. The manufacturer's protocol was 
additionally modified to reduce salt carry-over by adding a third wash step with 500 wL 80% ethanol and 
eluting in SO pL nuclease-free water. 


Real-time. Taqman PCR 


PCR primers for the TaqMan assay were previously validated™: EBV p143 forward (5'- 
GGA.ACC.TGG.TCA.TCC.TTG.C) and EBV p143 reverse (5' ACG.TGC.ATG.GAC.CGG.TTA.AT) 
(Thermo Fisher Scientific, Waltham, MA, USA). A fluorogenic probe (5’-(FAM)-CGC AGG CAC TCG 
TAC TGC TCG CT-(MGB)-3’) with a FAM reporter molecule attached to the 5’ end and an MGB 
quencher linked at the 3’ end was acquired in parallel (Thermo Fisher Scientific). The PCR amplification 
was performed in a 20-ul volume containing 10 wL 2* Luna Universal Probe One-Step RT-qPCR Kit 
(New England BioLabs, Ipswich, MA, US), 300 pmol of each primer per pl, 200 pmol of the TaqMan 
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probe, and 5 ul of isolated DNA. Amplification and detection were performed on a CFX96 Touch 
instrument (Bio-Rad, Hercules, CA, US). After a 1-minute hold step at 95 °C, the PCR cycling program 
consisted of 42 two-step cycles of 15 s at 95 °C and 30 s at 60 °C. Real-time measurements were taken, 
and a threshold cycle (C:) value for each sample was calculated if fluorescence exceeded a threshold limit. 
Each specimen was run in duplicate and was considered positive only if both replications were above the 
threshold limit. Each run contained multiple H2O controls (no template), and a standard curve containing 
serial dilutions of quantitative synthetic DNA (described below, ATCC, VR-3247SD). An additional 
EBV Plasma Control was included as a positive control for each assay plate (Thermo Fisher Scientific, 
#961231). 


Estimating Genome Copy Number Standards 


For standardization of qPCR detection of EBV viral genomes from participant plasma, a commercially 
available standard containing 5.59 x 108 EBV genome copies per ml (ATCC, VR-3247SD) was used. 
Serial log dilutions of this standard, ranging from 10°to 10° copies per ml, were made to establish the 
sensitivity of the TaqMan RT-PCR and included on each assay plate. The standard curve was created in 
the usual way by plotting the C; values against the standard of known concentration. x-y scatter diagrams 
were drawn, and the correlation coefficient (7”) was determined. Linear regression analysis was done 
using GraphPad Prism. 


Linear Peptide Profiling 
SERA serum screening 


A detailed description of the SERA assay has been published®. For this study, plasma was incubated with 
a fully random 12-mer bacterial display peptide library (1 x 10'° diversity, 10-fold oversampled) at a 1:25 
dilution in a 96-well, deep well plate format. Antibody-bound bacterial clones were selected with 50 wL 
Protein A/G Sera-Mag SpeedBeads (GE Life Sciences, #17152104010350) (IgG). The selected bacterial 
pools were resuspended in growth mediaand incubated at 37 °C shaking overnight at 300 RPM to 
propagate the bacteria. Plasmid purification, PCR amplification of peptide-encoding DNA and barcoding 
with well-specific indices was performed.as described. Samples were normalized to a final concentration 
of 4nM for each pool and run on the Illumina NextSeq500. Every 96-well plate of samples processed for 
this study contained healthy control run standards to assess and evaluate assay reproducibility and 
possible batch effects. 


For IgM isotype screening by SERA, the above IgG protocol was adjusted as follows: 1) after serum 
incubation with the library, F. coli cells were centrifuged, the supernatant removed, and the cells were 
resuspended in 500 nL 1x PBS containing a 1:100 dilution of biotin-SP (long-spacer) conjugated donkey 
anti-Human IgM secondary antibody (Jackson Immunoresearch, 709-066-073); 2) The plate was 
incubated for lL. hour at 4 °C with orbital shaking (800 rpm), the cells were again centrifuged, supernatant 
removed, and cells were resuspended in 700 nL of 1x PBS + 100 nL of Dynabeads MyOne streptavidin 
TI coated magnetic beads (ThermoFisher Scientific, 65601); 3) The plate was incubated for | hour at 4 
°C with orbital shaking (800 rpm), after which time the plate was magnetized and the beads-antibody 
complex along with their bound peptide-bearing cells were captured. All subsequent steps were identical 
for IgG and IgM screening as described. IgM antibody repertoires were evaluated for Epstein-Barr virus 
antibodies in two ways; 4) Samples were analyzed on an existing EBV IgM epitope panel that was 
developed and validated on clinically confirmed mononucleosis patients and EBV IgM negative controls. 


PIWAS analysis 
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The published PIWAS method” was used to identify antigen and epitope signals against the Uniprot 
reference SARS-CoV-2 proteome (UP000464024). For each sample, approximately 1-3 million 12-mers 
were obtained from the SERA assay and these were decomposed into constituent 5- and 6-mers. An 
enrichment score for each k-mer was calculated by dividing the number of unique 12-mers containing the 
k-mer divided by the number of expected k-mer reads for the sample, based on amino acid proportions in 
the sample. A z-score per k-mer was then calculated by comparing the enrichment score with those froma 
large pre-pandemic cohort (n = 1,500) on a log10 scale. A PIWAS value at each amino acid position 
along a protein sequence represents an averaged score within a 5 amino acid frame using the tiling z- 
scores of 5-mers and 6-mers spanning the sequence. 95th quantile bands were calculated based oneach 
population separately. 


Protein-wide identification of epitopes (PIE) 


PIE methodology for epitope identification was performed to locate regions on a protein sequence that 
had stronger outlier signals in the case samples relative to control samples from a large pre-pandemic 
cohort (n = 1,500). The distribution of case sample values relative to the control was analyzed at each 
amino acid position on the SARS-CoV-2 Spike protein sequence. Specifically, at each position, all case 
and control sample values were normalized using median absolute deviation based on the distribution of 
the control sample values. An outlier threshold was defined as Q75 +1.5%*(Q7s-Qos), where Q, is the x™ 
percentile of the control values at that specific position®’. An outlier sum statistic was then calculated as 
the sum of signal values above the outlier threshold in the case samples. A null distribution for the 
outlier sum value was calculated by permuting case/control labels and recalculating 1000 times. A p- 
value was calculated based on a z-score by comparing the observed outlier sum statistic to the null 
distribution. A significant p-value threshold was set to 0.001 after false discovery rate (FDR) adjustment 
by the Benjamini—Hochberg procedure and an outlier sum*threshold was set to the 99.5" percentile value 
of all positions with FDR adjusted p-value > 0.001. Allsequence positions that exceeded both thresholds 
were identified, and adjacent positions were merged into regions representing epitopes on the protein. 


IMUNE-based motif discovery 


Peptide motifs representing epitopes or mimotopes of SARS CoV-2-specific antibodies were discovered 
using the IMUNE algorithm. A total of 164 antibody repertoires from 98 hospitalized subjects from the 
Yale IMPACT study were used for motif discovery. The majority of subjects were confirmed SARS 
CoV-2 positive by NAT. IMUNE compared ~30 disease repertoires with ~30 pre-pandemic controls and 
identified peptide patterns that were statistically enriched (p-value < 0.01) in >25% of disease and absent 
from 100% of controls. Multiple assessments were run with different subsets of cases and controls. 
Peptide patterns identified by IMUNE were clustered using a point accepted mutation 30 (PAM30) matrix 
and combined into motifs. The output of IMUNE included hundreds of candidate IgG and IgM motifs. A 
motif was classified as positive in a given sample if the enrichment was >3 times the standard deviation 
above the mean of the training control set. The candidate motifs were further refined based on at least 
98% specificity. The final set of motifs was validated for sensitivity and specificity on an additional 1,500 
pre-pandemie controls and 406 unique confirmed COVID-19 cases from four separate cohorts. 


Motif grouping by similarity 


For SARS-CoV-2, motifs were grouped if they shared at least 3 of 5 amino acid identities, resulting in 76 
motifs being assigned into 24 groups. The motif within an epitope group with the greatest sensitivity and 
mean enrichment was included in the SARS-CoV-2 Infection IgG panel results. In some cases, two motifs 
were selected from the same group since their combination improved sensitivity. The remaining motifs 
that did not fall into a group were further down-selected based on a specificity of >99.5%, leaving 24 
additional motifs. 
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Rapid Extracellular Antigen Profiling (REAP) 
REAP Library Expansion 


The initial yeast library (Exo201) was generated as previously described!*”. In Exo201, only extracellular 
domains >49 amino acids in length were included in the library. To generate the library used for this 
study, Exo201 was with all extracellular domains of multi-pass membrane proteins greater than 15 amino 
acids and 225 extracellular viral antigens. DNA for new antigens was synthesized as either a Gene 
Fragment (for antigens over 300 nucleotides) or as an Oligo pool by TWIST Bioscience, containing a 5’ 
sequence (CTGTTATTGCTAGCGTTTTAGCA) and 3’ sequence (GCGGCCGCTTCTGGTGGC) for 
PCR amplification. The oligo pool was PCR amplified and transformed into yeast with barcode 
fragments, followed by barcode-antigen pairing identification as previously described!”. This new yeast 
library was then pooled with the initial library (Exo201) in the ratio of 1:1 to generate the new version of 
the library (Exo205) which contained 6,452 unique antigens. 


REAP Protocol 


Participant IgG isolation and REAP selections were performed as previously described'*”. Briefly, IgG 
was purified from participant plasma using protein G magnetic beads followed by adsorption to yeast 
transformed with the pDD003 empty vector to remove yeast-reactive IgG. The Exo205 yeast library was 
induced in SGO-Ura medium, and 108 induced yeast cells;were washed with PBE and added to wells of a 
sterile 96-well plate. 10 ug of purified participant IgG was added to the yeast library in duplicate in 100 
uL PBE and incubated for 1 hour at 4C. Yeast cells were washed with PBE and incubated with 1:100 
biotin anti-human IgG Fc antibody (clone QA19A42, Biolegend) for 30 minutes. Yeast cells were washed 
with PBE and incubated with a 1:20 dilution of Streptavidin MicroBeads (Miltenyi Biotec) for 30 
minutes. Yeast were resuspended in PBE and IgG-bound yeast were isolated by positive magnetic 
selection using the MultiMACS M96 Separator (Miltenyi Biotec) according to manufacturer 

instructions. Selected yeast were resuspended in 1 mL SDO -Ura and incubated at 30 °C for 24 hours and 
then plasmid DNA was harvested for NGS analysis. Briefly, DNA was extracted from yeast libraries 
using Zymoprep-96 Yeast Plasmid Miniprep kits or Zymoprep Yeast Plasmid Miniprep II kits (Zymo 
Research) according to standard manufacturer protocols. A first round of PCR was used to amplify a 
DNA sequence containing the protein display barcode on the yeast plasmid. A second round of PCR was 
performed on 1 wL step PCR product using Nextera i5 and i7 dual-index library primers (Illumina). 
PCR products were pooled; run on a 1% agarose gel, and DNA corresponding to the band at 257 base 
pairs was cut. DNA (NGS library) was extracted using a QIAquick Gel Extraction Kit (Qiagen) according 
to standard manufacturer protocols. NGS library was sequenced using an Illumina NextSeq550 and an 
NextSeq high output sequencing kit with 75 base pair single-end sequencing according to standard 
manufacturer protocols. Approximately 500,000 reads (on average) per sample was collected and the pre- 
selection library was sampled at ten times greater read depth than other samples. Samples with less than 
50,000 reads. were classified as a sequencing failure and removed from further analysis. 


REAP data analysis 


REAP scores were calculated as previously described'*”. Briefly, barcode counts were extracted from raw 
NGS data using custom codes and counts from technical replicates were summed. Next, aggregate and 
clonal enrichment was calculated using edgeR’' and custom computer scripts. Aggregate enrichment is the 
log. fold change of all barcodes associated with a particular protein summed in the post-library relative to 
the pre-library, with zeroes in the place of negative fold changes. Log: fold change values for clonal 
enrichment were calculated in an identical manner, but barcode counts across all unique barcodes 
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associated with a given protein were not summed. Clonal enrichment for a given reactivity was defined as 
the fraction of clones out of total clones that were enriched (log, fold change > 2). Aggregate (E.) and 
clonal enrichment (E.) for a given protein, a scaling factor (B.) based on the number of unique yeast 
clones (yeast that have a unique DNA barcode) displaying a given protein, and a scaling factor (B;) based 
on the overall frequency of yeast in the library displaying a given protein were used as inputs to calculate 
the REAP score, which is defined as follows: 


Equation 2: REAP score = E, x (Ec)? Bu x Br 


B. and B; are logarithmic scaling factors that progressively penalize the REAP score of proteins with low 
numbers of unique barcodes or low frequencies in the library, and are described in detail in previous 
publications'*”. 


Antigens with an average REAP score greater than 0.5 across all samples were defined as non-specific 
and excluded from further analysis. Autoantibody reactivities were defined as antigens with REAP score 
greater than or equal to 1. 


REAP Antigen ELISA Validation 


96-well MaxiSorp plates (Thermo Scientific #442404) were coated with 200 ng per well of recombinant 
EBV p23 protein (ProSpec #ebv-274) in PBS and incubated,overnight at 4 °C. Plates were dumped out 
and incubated with 3% Omniblock non-fat dry milk (American Bioanalytical #AB10109-00100) in PBS 
for 2 hours at RT. Plates were washed 3x with 200 ul wash buffer (PBS 0.05% Tween). Samples were 
diluted in 1% Omniblock non-fat dry milk in PBS and added to the plate to incubate 2 hours at RT. Plates 
were washed 6 with wash buffer. Goat anti-human IgG Fc HRP (Sigma Aldrich, #AP112P) diluted 
1:10000 in 1% Omniblock non-fat dry milk in PBS.was)added to the plates and incubated 1 hour at RT. 
Plates were washed 6x. Plates were developed with 100 ul of TMB Substrate Reagent Set (BD 
Biosciences #555214) and the reaction was.stopped after 5 min by the addition of 2 N sulfuric acid. Plates 
were then read at a wavelength of 450 nm. 


Machine Learning 
Data Preprocessing. 


All collected data forimmune profiling were collated. Features containing redundant information were 
manually removed fromthe dataset (e.g., nested flow cytometry populations include only the extant 
population). 


All features were linearly scaled to unit variance and zero-centered using the R programming language 
base libraries’. Median absolute deviation was calculated for each feature across all samples, with 
missing values removed. Features with a median absolute deviation equal to zero or features where data 
was not available in at least half the samples were not included in downstream analysis. Prior to 
visualization of the data using principal component analysis, features were additionally quantile- 
normalized using the “normalize.quantiles” function of the “preprocessCore” package in R™. 


Gale-Shapley matching of participants by demographics. 
To ensure that immunologic features of participants in the LC cohort would be compared against the most 


similar set of controls in the CC and HC cohorts, a Gale-Shapley matching procedure was employed”. 
Participants in the LC cohort were first matched against participants in the CC cohort. Unmatched 
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participants in the LC cohort were subsequently matched against participants in the HC cohort. Preference 
lists required by the Gale-Shapley algorithm were determined using an affinity function calculated as the 
cosine similarity of participants in a unit scaled and zero centered demographics matrix containing age, 
sex, vaccination status, and days from the initial onset of acute COVID-19. The matching was performed 
by the “galeShapley.marriageMarket” function of the “matchingR” package in R”. To evaluate matching 
efficacy, differences between groups in age, sex, vaccination status, acute COVID-19 hospitalization 
status, and days from initial onset of acute COVID-19 were assessed using a Chi-square test. For age, 
participants were segmented into groups as either less than 32 years of age, between 33 and 51 years of 
age, or greater than 52 years of age. For days from symptom onset, participants were segmented into 
groups as either 1-2 months from acute infection, 2—5 months from acute infection, 6-8 months from 
acute infection, or >9 months from acute infection. An alpha of 0.05 was used throughout. 


Unsupervised Analysis. 


Principal component analysis was performed on the set of normalized features for all matched participants 
7. To assess how well participants were grouped by all features, a k-nearest neighbor classifier with k=10 
was applied separating participants with Long COVID from those without (either convalescent 
participants or healthy controls). A k of 10 was chosen by heuristic as approximately equal to the square 
root of the number of samples included”. A range of values for k from 5-15 were tested and found to give 
similar results. Area under the receiver operating characteristic curve (AUC) and 95% confidence 
intervals were calculated using DeLong's method; p-values were calculated using the Mann-Whitney U 
statistic”, 


Supervised Analysis. 


Principal components regression was applied to each of.a predefined set of data segments: autoantibodies, 
SARS-CoV-2 antibodies, non-SARS-CoV-2 viral antibodies, plasma proteomics, and flow cytometry 
readouts. The precise definitions of these.data segments are provided as metadata. The first n principal 
components based on explained variance (see below for selection method) were selected from the 
normalized feature set and used to fit a logistic regression model (implemented as a binomial generalized 
linear regression with a logit link) for classification of participants with Long COVID as compared to 
matched convalescent participants without long term symptoms and uninfected controls. 


To determine the optimal-value for n (number of principal components), values were scanned, and seven- 
fold cross validation.was performed on the data set. The average mean squared error was calculated for 
each cross-validation iteration at a particular value of n. For the binomial regression run using a logit link 
function, McFadden's pseudo-R? was calculated and averaged across each of the cross-validation folds. 


Plots of explained;variance and mean squared error across all scanned values for n were generated and 
visually inspected to choose an optimal value for n that maximized explained variance while minimizing 
overfitting as identified by increasing average mean squared error. This procedure was performed on each 
of the segments, and an optimal n was chosen for each of the following: autoantibodies (n = 5), SARS- 
CoV-2,antibodies (n = 3), non-SARS-CoV-2 viral antibodies (n = 33), plasma proteomics (n = 20), and 
flow cytometry (n= 21). 


A model fitted on the first n principal components (or any linear transformation) was related to each of 
the original features as follows. Each principal component may be considered as a weighted linear 
combination of the original features. The principal component loading vectors were used to project the 
fitted beta values from the logistic regression model using the linearity of expectation, E(X + Y) = E(X) + 
E(Y), such that the estimated parameter for each variable was the weighted sum of the parameter 
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estimates for the principal components to which it contributed. The variance of fit for each of the original 
features was similarly projected from the fitted principal components as the variance of a sum of random 
variables Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y). P-values were calculated for each variable in the 
original feature space using z-scores. 


Following per-segment model construction and evaluation, features with a Bonferroni-corrected p-value 
of less than 0.05 were selected for inclusion in a final principal components regression. These selected 
features were considered as a separate integrated data segment and processed in the same way as each 
individual data segment with a selected (7 = 9) number of included principal components. A least absolute 
shrinkage and selection operator (LASSO) regression was employed to select a subset of the features with 
p-values less than 0.05 as a minimal model, and McFadden’s pseudo-R? was calculated. 


An implementation has been made publicly accessible as an R library on GitHub at 
(https ://github.com/rahuldhodapkar/puddlr). 


Symptom Bi-clustering. 


Participants with Long COVID were clustered based on binary self-reporting of Long COVID symptoms. 
Hamming distance was used with complete linkage clustering as anagglomeration method. Visualization 
of the bi-clustering was performed using the ‘ComplexHeatmap’ package in R*°. Cluster stability was 
assessed by bootstrapped resampling with 100 iterations using the ‘fpc’ package in R*'. 


General Statistical Analysis 


Study sample sizes were not pre-determined through formal power analysis. Specific statistical 
methodology can be found in relevant figure legends and manuscript text. Generally, comparison of 
immunophenotypic features including systemic cytokine levels and antibody concentrations between 
study cohorts was performed using estimates of group medians, primarily with non-parametric Kruskal- 
Wallis tests. All statistical tests were two sided. 


The difference in median between,the days from the symptom onset (DFSO) of acute COVID-19 in the 
LC and CC groups was assessed using a two-tailed Brown-Mood median test with an alpha of 0.05. The 
test was performed using the ‘coin’ package in R®. Flow cytometry populations were assessed using 
estimates of group means with permutational testing using PERMANOVA to control for within-group 
heterogeneity (described above). 


In cases where Kruskal-Wallis testing indicated significant differences, post-hoc testing using Dunn’s test 
was performed. Correction for multiple comparisons was performed using Bonferroni or Bonferroni- 
Holm method as indicated. All statistical tests were performed using R, PRISM, and MATLAB software. 


Data availability 


All of the raw fcs files for the flow cytometry analysis are available at the FlowRepository platform 
(http://flowrepository.org/) under Repository ID: FR-FCM-Z6KL. Protein structures were visualized 
using UniProt repository under the following accession numbers: trimeric Spike (PDB: 6VXX) and EBV 
gH/gL (PDB: 5T1D). Raw data are included in Supplementary Table 3. 


Code availability 
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Computer codes are available as indicated (e.g., Attps://github.com/rahuldhodapkar/puddlr) or otherwise 
available upon request. 
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Extended Data Figure Legends 


Extended Data Figure 1. Additional demographic and clinical analysis of Long COVID cohort. (A) 
Box plots of Min-Max normalized survey responses (n=35 HC, 20 CC, 98 LC). Individual survey 
instruments are arranged in columns with corresponding health dimensions below. Surveys in red were 
aggregated to generate Long COVID Propensity Scores (LCPS). Significance was assessed using Kruskal- 
Wallis tests corrected for multiple comparisons using Bonferroni’s method. (B) Receiver-Operator Curve 
(ROC) analysis of LCPS scores; Area under the curve (AUC) is reported with Bootstrap Bias-corrected 
95% confidence intervals (CI) of AUC. (C) Ring plots of prevalence of Postural Orthostatic Tachycardia 
Syndrome (POTS) among Long COVID cohort. “No diagnosis” is represented by grey regions, “positive 
diagnosis” is represented _by shaded purple regions. Purple regions are further stratified by diagnostic 
modality: clinical = diagnosed»through clinical evaluation (light purple); Tilt-table = diagnosed by Tilt- 
table (middle purple); Stand / Lean = diagnosed by Stand / LEAN test (dark purple). (D) Ring plots of 
prevalence of self-reported negative impacts on employments status among individuals with Long COVID. 
Negative responses are represented by grey region, positive responses are indicated by purple region. (E) 
Heatmap of self-reported binary symptoms clustered by Hamming distances (rows and columns) and 
colored. according to physiological system as previous. Columns are annotated by LCPS scores with 
bootstrapped cluster reproducibility scores reported in parentheses (bootstrapped Jaccard similarity) (F) 
Boxplots of Long Covid Propensity Score (LCPS) plotted by group (HC = healthy control; CC = 
convalescent control; LC = Long COVID) and cluster. Central lines represent group medians, bottom and 
top lines represent 25" and 75" percentiles, respectively. Whiskers represent 1.5 inter-quartile range 
(IQR). Significance for difference in median LCPS was assessed using Kruskal-Wallis with correction for 
multiple comparisons using Bonferroni-Holm. 


Extended Data Figure 2. Immunological differences in myeloid and lymphocyte effectors among 


participants with Long COVID. (A-B) Violin plots of myeloid peripheral blood mononuclear populations 
(PBMCs) plotted by group as percentages of respective parent populations (gating schemes detailed in 
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Extended Data Fig. 10). (B, right) Coefficients from linear model are shown. Model predictors are 
indicated on x-axis. Significant predictors (p<0.05) are plotted in purple. Detailed model results are 
reported in Extended Data Table 4. (C) Violin plots of B lymphocyte subsets from PBMCs plotted as 
percentages of respective parent populations (gating schemes detailed in Extended Data Fig. 10). (D,E) 
Violin plots of various CD4 (top row) and CD8 (bottom row) populations. (F) Violin plots of IL-4 and IL- 
6 double-positive CD4* (left) and CD8" (right) T cells plotted as percentages of total CD4” or CD8" T cells. 
(G) A PERMANOVA test of the association between all cell populations shown and participant age, sex, 
LC status, and body mass index (BMI). For all violin plots (A—F), significance was assessed using Kruskal- 
Wallis corrected for multiple comparisons using Bonferroni-Holm. Each dot represents a single patient 
(n=40 HC, 33 CC, 99 LC). Central bars indicate the median value of each group. Only significant 
differences between group medians are shown. 


Extended Data Figure 3. Circulating myeloid, B cell, and cytokine producing immune cell 
populations among MY-LC participants. (A—I) Violin plots of various myeloid, B, and T cell PBMC 
populations stratified by healthy (HC), convalescent (CC), and Long COVID (LC) groups. Significance for 
differences in group medians was assessed using Kruskal-Wallis tests with correction for multiple 
comparisons using Bonferroni-Holm. Each dot represents a single patient (n= 40 HC, 33 CC, 99 LC) (J- 
KX) Coefficients from linear models for various PBMC populations. Bars in purple indicate significant 
predictors of specific PBMC populations (p < 0.05). 


Extended Data Figure 4. Absolute Counts of in myeloid and lymphocyte effectors among participants 
with Long COVID. (A-B) Violin plots of myeloid peripheral blood mononuclear populations (PBMCs) 
plotted by group (HC, healthy control; CC, convalescent.control; LC, Long COVID) as absolute cell counts 
(gating schemes detailed in Extended Data Figure 10A). Significance for differences in group medians 
was assessed using Kruskal-Wallis tests with correction for multiple comparisons using Bonferroni-Holm. 
(C) Violin plots of B lymphocyte subsets from peripheral blood mononuclear populations (PBMCs) plotted 
as absolute cell counts (gating schemes detailed in Extended Data Figure 10D). Significance was assessed 
using Kruskal-Wallis with correction for multiple comparison using Bonferroni-Holm. (D, E) Violin plots 
of various CD4 (top row) and CD8 (bottom row) populations. Significance was assessed using Kruskal- 
Wallis with correction for multiple comparison using Bonferroni-Holm. (F) Violin plots of IL-4 and IL-6 
double positive CD4* (left),and CD8* (right) T cells plotted as absolute cell counts. Significance was 
assessed using Kruskal-Wallis with correction for multiple comparison using Bonferroni-Holm. For all 
plots (A-F), central bar insthe violin plot indicated the median value of each group. Each dot represents a 
single patient (n = 37,HC, 28 CC, 94 LC). 


Extended Data Figure 5. Humoral Analysis of SARS-CoV-2 specific antibodies. (A) Dot plots of IgG 
concentrations from historical, unvaccinated SARS-CoV-2 exposed controls (HCW+) and unvaccinated 
Long COVID participants. Central lines indicate median group values with bars representing 95% CI 
estimates. Vaccination status for each cohort is indicated by the form “x0” where the digit indicates the 
number of SARS-CoV-2 vaccine doses. Significance for differences in group medians were assessed using 
Kruskal-Wallis with correction for multiple comparison using FDR (Benjamini-Hochberg). Each dot 
represents a single patient (n= 19 HCW, 19 LC). (B) Coefficients from linear models are reported for anti- 
RBD antibody responses. Model predictors are reported along the x-axis and included age, sex (categorical), 
Long COVID status (categorical), body mass index (BMI), and number of vaccinations at blood draw. 
Significant predictors (p < 0.05) are plotted in purple. Detailed model results are reported in Extended Data 
Table 5. (C) Boxplots of antibody binding to various SARS-CoV-2 linear peptide sequences plotted by 
group (HC = healthy control; CC = convalescent control; LC = Long COVID) amongst participants who 
have received 1 or more vaccine doses. Each dot represents one individual. Central bars represent groups 
medians, with bottom and top bars representing 25" and 75" percentiles, respectively. Dashed line 
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represents z-score threshold for epitope positivity defined by SERA. Statistical significance determined by 
Kruskal-Wallis with correction for multiple comparisons using Bonferroni-Holm. Each dot represents an 
individual patient: LC (purple, n= 80), HC (orange, n = 39) and CC (yellow, n= 38). (D) Proportion of each 
group amongst participants who have received 1 or more vaccine doses (LC: n= 80, control: n= 77) that is 
seropositive (z-score>3) for each of 7 linear Spike motifs mapping to outlier peaks. Motifs with 
significantly different seropositivity between groups are highlighted in red, as determined by Fisher’s exact 
test corrected for multiple comparisons by FDR (Benjamini-Hochberg). (E) Coefficients from linear models 
are reported for anti-RBD antibody responses. Model predictors are reported along the x-axis and included 
age, sex (categorical), Long COVID status (categorical), body mass index (BMI), and number of 
vaccinations at blood draw. Significant predictors (p < 0.05) are plotted in purple. Detailed model results 
are reported in Extended Data Table 5. Abbreviation: HCW+, previously SARS-CoV-2 infected healthcare 
worker. 


Extended Data Figure 6. Significantly different soluble plasma factors across MY-LC cohorts. (A— 
H) Violin plots of various z-score transformed circulating plasma factors across healthy (HC), convalescent 
(CC), and Long COVID (LC) cohorts. Significance of difference in group medians was assessed using 
Kruskal-Wallis corrected for multiple comparisons using Bonferroni’s method. P-values from multiple 
Kruskal-Wallis testing were adjusted using the Benjamini-Hochberg procedure. (I) Negative Logio 
transformed p-values from Kruskal-Wallis tests plotted against-Spearman correlations with LCPS for 
various plasma factors. Reported p-values are adjusted for multiple comparisons using FDR (Benjamini- 
Hochberg). Horizontal line represents significance threshold for a.difference in group medians. Vertical 
lines represent the minimum correlation values for plasma factors significantly correlating with LCPS 
scores. Red depicts factors with significant differencesin group medians and significant correlations with 
LCPS. 


Extended Data Figure 7. Analysis of private autoantibodies within the MY-LC cohort. (A—B) 
Correlation plots depicting relationships between number of autoantibody reactivities and %DN of B cells 
(A) or days from symptom onset (DFSO) and number of autoantibody reactivities (B). For all panels, 
correlation was assessed using Spearman’s method. Black line depicts linear regression with 95% CI 
shaded. Colors depict Long COVID cluster (cluster 3, blue; cluster 2, green; cluster 1, red). Each dot 
represents one individual. (C) Grouped box plot depicting reactivity magnitude per individual in the listed 
GO Process domain. Reactivity magnitude is calculated as the sum of REAP scores for all reactivities per 
individual in a given GO Process domain. Statistical significance assessed by Kruskal-Wallis and adjusted 
for multiple comparisons using. FDR (Benjamini-Hochberg) correction. Boxplot colored box depicts 25th 
to 75th percentile of the data, with the middle line representing the median, and outliers depicted as points. 
(D) Heatmap depicting:autoantibody reactivity for GPCRs included in the REAP library. Each column is 
one participant, grouped by control or LCPS cluster. HC = healthy control, CC = convalescent control, LC 
= Long COVID. Abbreviations: GPCR = G-protein coupled receptor. 


Extended, Data Figure 8. Non-SARS-CoV-2 humoral responses among participants with Long 
COVID. (A) Heatmap depicting REAP reactivities to viral antigens across the MY-LC cohort. Each 
column is one participant, grouped by control or LCPS cluster. Column clustering within groups performed 
by K-means clustering. Each row is one viral protein. Reactivities depicted have at least one participant 
with a REAP score >= 2. (B) REAP scores for VZV gE by group (HC = healthy control; CC = convalescent 
control; LC = Long COVID). Statistical significance determined by Kruskal Wallis with correction for 
multiple comparison using Bonferroni-Holm. Each dot represents one individual (n= 25 HC, 13 CC, 98 
LC). Bottom and top lines depict 25" to 75" percentile of the data, with the middle line representing the 
median. Whiskers represent 1.5x the inter-quartile range (IQR). (C) Proportion of each group (LC: n= 99, 
control: n= 78) seropositive for each of 30 common pathogen panels as determined by SERA, grouped by 
pathogen-type (LC = Long COVID). Statistical significance determined by Fisher’s exact test corrected 
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with FDR (Benjamini Hochberg). (D) Sum of SERA-derived z-scores for IgM reactivity to EBV antigens 
plotted by group. Statistical significance determined by Kruskal-Wallis with correction for multiple 
comparison using Bonferroni-Holm. Each dot represents one individual (n= 22 Mono-control, 40 HC, 38 
CC, 98 LC). Boxplot colored box depicts 25" to 75" percentile of the data, with the middle line representing 
the median. Whiskers represent 1.5 the inter-quartile range (IQR). (E) Standard curve for Taqman PCR 
of EBV BNRF1. Serial dilutions of EBV standard ranging from | to 10° copies per 200 uL input material 
were made. C; values are plotted against standard copy number, demonstrating ability to detect 1 genome 
copy. (F) Copies of EBV genome detected in participant serum by Taqman PCR for EBV BNRF7 plotted 
by group. All samples were below the limit of detection. (G) Correlation plot depicting the relationship 
between EBV p23 REAP score and EBV p23 ELISA O.D. 450 nm. Correlation assessed by Spearman. 
Black line depicts linear regression with 95% CI shaded. Colors depict group (purple, LC; yellow; CC; 
orange, HC). Each dot represents one individual. (H,D) REAP scores for HSV1 gD1 (H) and HSV1 gL () 
amongst HSV 1 seropositive individuals only, separated by group (HC = healthy control; CC = convalescent 
control; LC = Long COVID). Statistical significance determined by Kruskal Wallis with,correction for 
multiple comparison using Bonferroni-Holm. Each dot represents one individual, Boxplot colored box 
depicts 25" to 75" percentile of the data, with the middle line representing the median. Whiskers represent 
1.5x the inter-quartile range (IQR). Each dot represents one individual. (J,KK) Correlation plot depicting the 
relationship between Long COVID Propensity Score (LCPS) and EBV gp42 PVXF[ND]K (J) or EBV p23 
REAP score (K). Correlation assessed by Spearman. Each dot represents,one individual. Colors depict Long 
COVID cluster (cluster 1, blue; cluster 2, green; cluster 3, red), Black line depicts the linear regression, 
with the 95% CI shaded. (L-O) Linear regressions of various SARS-CoV-2 antigens and IL-4/IL-6 double 
positive CD4 T cells. Spearman’s correlation were calculated foreach pair of variables, with corresponding 
p-values reported. Black lines depict linear regressions withthe shaded area representing 95% confidence 
boundaries. 


Extended Data Figure 9. Gale-Shapley matching of Long COVID group and controls harmonizes 
samples by disease and demographics characteristics. (A) Features used in the preference list 
construction for Gale-Shapley matching are shown. Individual paired samples are shown for participant age 
and days from initial acute COVID-19 infection (dfso). Paired plots for sex and vaccination status are 
shown. (B) Additionally, differences between populations in the severity of initial acute COVID-19 
infection are shown. No differences between groups are significant by a Chi-square test. (C,D) Box plots 
of selected features assessed in the Ext; LC group. Center lines represent median values with error bars 
representing 1.5 standard deviation. (E) Distribution of respiratory symptoms (“dyspnea” or “shortness of 
breath”) between individuals with Long COVID in the MY-LC study and the Ext. LC group. Significance 
was assessed using Fisher’s exact test. (F—H) ROC curve analysis using cortisol, cDC1, and galectin-1 
levels as an individual classifier of Long COVID status. AUC and 95% CI intervals (DeLong’s Method) 
for each feature are displayed (top). Kernel-density smoothed histograms for HC, CC and LC cohorts for 
selected model,predictors. Vertical lines depict threshold values for each feature with maximal 
discriminatory accuracy (bottom). 


Extended Data Figure 10. Flow Cytometry gating schematics. (A—D). Various gating strategies for 
granulocyte and myeloid populations (A), T lymphocytes (B), intracellular cytokine staining (C), and B 
lymphocytes (D). 


Extended Data Table Legends 


Extended Data Table 1. Clinical Demographics of MY-LC Cohort. Summary demographic and clinical 
characteristics for the MY-LC Study. Participants were stratified into three study arms at enrollment: (1) 
Long COVID (prior SARS-CoV-2 infection with persistent, unexplained symptoms); (2) healthy study site 
cohort (no prior SARS-CoV-2 infection); or (3) convalescent COVID-19 cohort (prior SARS-CoV-2 


29 


1329 
1330 
1331 
1332 
1333 
1334 
1335 
1336 
1337 
1338 
1339 
1340 
1341 
1342 
1343 
1344 
1345 
1346 
1347 
1348 
1349 


1350 
1351 
1352 


1353 
1354 


1355 
1356 
1357 


1358 
1359 


1360 
1361 


infection without persistent symptoms). Various demographic features and clinical characteristics are 
reported by row for each cohort (row measurement units are specified in parentheses). Within each cell, 
counts or clinical feature averages are reported, with sample standard deviations, relative cohort 
percentages, and participant numbers reported where pertinent. Results from statistical tests are reported as 
p-values and accompanying test statistics: + Chi-square test p-value (Chi-square test statistic, degrees of 
freedom (df)); ¢+ Kruskal-Wallis ANOVA p-value; t+? Fisher's exact test p-value (Odd's Ratio: [95% 
Confidence Interval (Baptista-Pike)]); { Mann-Whitney U test p-value. Post-hoc comparisons ‘were 
conducted using Dunn’s test with Tukey’s correction for multiple comparison (column comparison order 
left-right: 1-2, 1-3, 2-3). Participant medical histories were collected and collated from binary self-reports 
of prior medical history and review of electronic medical records by study staff (positive responses in either 
participant self-report or EMR review were considered an overall binary positive response). Abbreviations: 
n, number; M, male; F, female; BMI, body mass index; +PCR, positive result from SARS-CoV-2 nucleic 
acid test; +Ab, positive result from SARS-CoV-2 antibody test; Y, Yes; N, No. 


Extended Data Table 2. Normalized survey responses across MY-LC cohorts. Survey responses for 
participants are organized by individual instruments (columns) and MY-LC cohorts (rows). Participant 
responses for each survey instrument were summed and normalized using standard min-max normalization 
procedures such that a value of 1 equals the maximum possible aggregate score and 0 equals the minimum 
possible aggregate score. Additionally, individual survey elements were oriented through inversion such 
that higher normalized scores on each instrument indicate a higher intensity or degree of agreement with 
survey prompts. For each cohort, median values are displayed. 


Extended Data Table 3. Determinations of optimal LCPS threshold. Classification metrics across 
different LCPS thresholds (‘Cut-offs’) (Upper table). Summary area-under the curve (AUC) statistics and 
bootstrap confidence intervals for Receiver-Operator curve analysis (ROC) (lower table) 


Extended Data Table 4. Modeling of select flow cytometry populations. (A—L) Detailed linear modeling 
results are reported for various cytokine producing T cell populations analyzed by flow cytometry. 


Extended Data Table 5. Modeling of anti-SARS-CoV-2 antibody and linear motif responses. (A—E) 
Detailed linear modeling results are reported for SARS-CoV-2 specific antibody responses and peptide 
motifs with corresponding model formulations. 


Extended Data Table 6. Modeling of cortisol levels. Detailed linear modeling results are reported for 
cortisol levels across:groups with corresponding model formulation. 


Extended Data Table 7. Inter-model Long COVID classification comparison. Cohen’s Kappa analysis 
of agreement between LCPS and Integrated immunological classification of Long COVID status. 
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Extended Data Fig. 2 


A Classical Monocyte Int. Monocyte Neutrophil Eosinophil B pDC 
30 
5 1.5 
« 
@ 20 
1s) 
o 15 1.0 
z 
pat | 
x) 
oe 1.0 
5 10 . 0.5 
2 
a 0.5 
0 0.0 
0.0 
HC cc Lc HC cc LC HC CC LC HC CC LC HC CC LC 
C Naive B cells pak og US Memory B cells CS Memory B cells D Naive CD4T cells. 
120 Y 
0.039 if) 100 
75 60 
n 
75 ra 
2 80 50 8 
g t 40 
ps 5.0 a 
S 50 O 
=< 40 25 a) 
8 25 = 
o 25 - 
7 @ 
0 0 0 
0.0 0 
HC cc Lc HC cc Lc HC CC LC HG CC »>LC HC CC LC 
E IL-17 F GZMB G IFNy H IFNy | TNFa 
CD4T cells CD8 T cells CD4T cells CD8 T cells CD4T cells 
60 100 40 80 
75 75 
~ 30 60 
5 40 
Ps 50 50 
Qa 
5 20 40 
S 20 
8 = 10 20 25 
oO 
o 
0 : 0 0 0 
HC cc Lc HC @CCEmLC HC cc Lc He cc Le HC cc Lc 
J Non-classical Monocytes DN B cells Activated B cells K IL-2* CD4+ IL-4* CD4+ 


0.5 = MB Not signficant 3 8 4 
IB Significant 

c = e 
8 g 8 8 3 

£2 & & 

Oo oO oO 

fe} Qo 4 fe} 
(S) {S) O92 

3 3 3 

© 1 52 8 
= = = 1 

0 
0 0 
Age Sex LC BMI Age Sex LC BMI Age Sex LC BMI 
CD4* IL4*/IL6* CD8* IL4*/IL6* 
25 1.5 

_20 _ me 

= = [= 
8 8 1.0 & 2.0 

1.5 & & 
oO oO 245 
fo} je) ol. 

S10 = 2 
a 205 © 1.0 

le} ° ° 
20.5 2 205 
0 0 0 


Age Sex LC BMI Age Sex LC BMI 


Extended Data Fig. 3 


Model Coefficent 


Age Sex LC BMI 
IL-2* CD8+ 


Model Coefficent 


Age Sex LC BMI 


bel 
a 


= = i) 
oO a oO 


° 
a 


_— 


NO + a 


oO 


Model Coefficent 


oO 


Age Sex LC BMI 
IL-4* CD8+ 


Age Sex LC BMI 


cDC2 


ia 


HC CC LC 


1.5 


1.0 


o 
a 


0.0 


Naive CD8 T cells 


= 
oa N Oo 
oO oa oO 


Percent of CD8 T cells 
iN) 
a 


HG cc LC 
TNFa 
CD8 T cells 
755 
505 
255 
0 =| 
HC cc LC 
IL-6* CD4+ 
1.5 


= 
Oo 


S 
a 


a 


Age Sex LC BMI 
IL-6* CD8+ 


A 


Age Sex LC BMI 


oO 


NO wo 


Model Coefficent 


= 


A 


x10® cells/ml 


0.1 <0.001 
0.05 
0.00 4 


(CD45RA/CD127/CCR7,) 


0.065 

4 
0.028 

as 


D 


0.3 


0.2 


x10® cells/ml 
x10® cells/ml 


0.1 
0.0 


HC CC LC 


| 0.081 


HC CC LC 


j=) 
NO 
fe 


oO 
a 


x10® cells/ml 
x108 cells/ml 


oO 
oO 


CD4* IL4*/IL6* 
<0.001 
<0.001 


"Tl 


oO 
ua 
oa 


0.104 


x10® cells/ml 


0.05 


0.004 ate ee 


HC CC LC 


non-Conventional Monocytes 
(CD14"/CD16") 


HLA-DR+ (% ncMono) 


B cDC1 
(CD304-/HLA-DR*/CD141"*) 
40 <0.001 .00030 
.00020 
_ 0.010 
E 0.010 
20 3 
o 
[S) 
10 © 0.005- 
x 
les 
0.000- abn 
HC CC LC HC CC LC 
Tem Tex 
(CD45RA/CD127-/CCR7*) (PD-1*/TIM-3*) 
020 0.059 
0.6 0.028 
015 
0.4 E 
2} 
= .010 
Oo 
0.2 S 
i = .0054 
00 .000 
HC CC LC HC CC LC 
0.10 .015: 
= .010 
0.05 a 
3 
[s) 
% 005 
x 
0.00 
.000 
HC CC LC HC cc LC 


CD8* IL4*/IL6* 
0.004 


0457 0.004 
0.007 
E 0.10 
LQ 
i 
oO 
a 
= 0.05 
HC CC LC 


Extended Data Fig. 4 


Activated B cells 
(HLA-DR"/CD86") 


_ 0.08 
£ 
a 
it 
oO 
© 0.04 
x 
{ 
200] & 
HC CC LC 
E IL-2* CD4+ 
0.011 
0.8) 0.032 
0.64 


x10® cells/ml 
° 
PS 


0.2. 
nl OS 
HC cc LC 
IL-2* CD8+ 
0.10. 0.011 
0.007 
E 
2 0.05: 
oO 
oO 
ro) 
* 
0.00 4 


Double negative B cells 


(IgD‘/CD27-/CD24-/CD38") 
<0.001 
0.06 0.001 
0.04 
0.02 
0.00] > in 
HC CC) LC 
IL-4* CD4+ IL-6" CD4+ 
0.001 0.023 
0.6 0.001 0.20, 0.019 
= _ 0.154 
£04 E 
2 a 
9 % 0.10 
5 02 . & 
x * 0.05 
0.0 pe 
0.004 
HC CC LC HG cc LC 
IL-4* CD8+ IL-6* CD8+ 
0.3 0.20 <0.001 
0.004 
| ee | 
E 0.2 = 
2) - 
8 3 
2, 0.1 gS 
x x 
0.0 
HC CC LC HC CC LC 


a 
E 198 
D 
a 
oO 105 °° 
od) e 
qt} 8 eae 
J ole 
& 103 *els 
db *s" 
salt 2 ee 
= 10 
<x 
10! 
HCW+ LC 
— x0 Vac.— 
B Anti-RBD 
2.5 
+e 2.0 
2 
L 
. 1.5 
fo) 
(Ss) 
@ 1.0 
ne} 
(eo) 
20.5 
0.0 
ts N 
S oe ee > 
ow es 
SES oo” 
Vv aes 


i Significant Predictor 
HB non-Significant Predictor 


D 


LDK[WY]F 
KFLPFQQ 
PTWR 


DISGI 


Spike Motif 


RDPQTLE 


RSVAS 


YTMSLG 


0.00 


Extended Data Fig. 5 


0.25 


107 
TF 406 
£ 
[o>) e 
air . 
oO 4 e = 
B10 A e 
a 102 ? e 
& 102 as 


HCW+ LC 
— x0 Vac.— 


Spike RDPQTLE 


Kruskal-Wallis, p = .00058 


100 0.0012 
0.00089 
rr 


zscore 


Spike YTMSLG 


Kruskal-Wallis, p = 0.45 


p = 0.0060 


® Control 
BLC 


0.50 
Seropositivity 


0.75 


p = 0.00066 


1.00 


107 
ay 
€ 10° 
© 108 ‘ 
(O) e 
ae a ee 
a 103 “ 
a ‘J e 
= 102 ° © Hy 
< 
10! 
HCW+ LC 
— x0 Vac.— 
Spike PTWR 


Kruskal-Wallis, p = 0.30 


Spike LDK[WY]F 


125 Kruskal-Wallis, p = 0.0034 
0.00058 ~ 


ns, . 


0 
HC 
Spike KFLPFQQ 
ie 
o 
3 5 
= 
ay 
fe} 
2 
So 
3 
= 
5 
RS N 
es oe s , S&S Ss 
CaN 
PP Sy? 
Na WwW 
8 Spike RDPQTLE 
= 6 
ao) 
2 
© 4 
fe} 
oO 
om 2 
so 
le} 
20 
-2 
2 oh N 
S oe ss Ss 
se Kor 
Wy 


ig 0.0061 
2 108 -. 
> 10° ° 
aoe : 
a 10 % Je 
Z 10° z H se 
= e 
1024 "ge 
101 
HCW+ LC 
— x0 Vac.— 
Spike RSVAS 


Kruskal-Wallis, p = 0.12 


Spike DISGI 
125) Kruskal-Wallis, p = 0.051 


Spike LDK[WY]F 
10 
€ 
2 
o 5 
5 
fe) 
oO 
3 0 
iS) 
= 
5 
@ hr © \ 
ye FOS S&S Ss 
Ps 
Ss Pio 
PW 3 
Spike DISGI 
8 
~€6 
a 
‘iS 
54 
fe) 
Oo 
G2 
no} 
3 
= 0 
2 
re So S&S 
y & eS s S 
o —e 
fe) PO 
VL YS 


A Complement C4b B CCL19 C CCL20 


5. Kruskal-Wallis p = 0 0.00028 ao} Kruskal-Wallis p=0 9 90057 ae Kruskal-Wallis p = 0 
<.0001 0.0013 0.00039 
. 8 
3 
6 
g . 2 
N N 
1 
2 
) 
0 
HC cc LC HC CC LC HC cc LC 
D Galectin-1 E ccL4 F APRIL 
5} Kruskal-Wallis p = 0 0.077 7+ Kruskal-Wallis p = .01 Kruskal-Wallis p = .01 
oe 0.017 
6 
0.00014 0.0018 
4 6 0.012 
5 
3 5 
4 
4 
2 ? 2 2 3 
fe) fo) ro} 
is) is) 3 13) 
® ® ® 
Nn 1 N N 2 
2 
) 1 
1 
-1 0 
0 
HC cc LC HC CC LC HC CC LC 
G LH H IL-5 | 
Kruskal-Wallis p= .02 9.958 12r Kruskal-Wallis p = .02 Cortisol” 
0.011 0.011 i 
5 10 a 
a 
= 
4 8 g 
2 
< 
o 3 © 
8 5 «6 8 
[s) Oo ow 
® ® 2 
N 92 N for 
: a) 
Ww 
1 = ; 
2 > Galectin-1|@Complement C4b 
) 
) 
O 
HC CC Lc HC cc Lc -0.6 -0.4 -0.2 0 0.2 0.4 


Spearman Rho (LCPS) 


Extended Data Fig.6 


Memory 


> 
0 


for) 
oO 
e 


30 

2 8 

8 40 2 R = -0.037, p = 0.72 

a © 

rs 9 20 

zZ © te % 

a xe) 

x 20 ° R=0.14, p= 0.16 . 
240 e LC Cluster 4 
E e LC Cluster 2 
Zz e LC Cluster 3 


oO 


Number of Reactivities 


20 
a 
© 15 
(> 
3 
5 10 
= 
= 
® 5 
E 
Z 
oe O 
Behavior Cognition Learning Digestive Immune Muscle Nerve Action Blood Heart Response Temperature Pain Sensory Smell Sound Vision 
and System System Systen Transmission Potential Pressure Rate to Perception 
Activity 
DFSO 
D CC LC Cluster 1 LC Cluster 2 LC Cluster 3 HC 
>= REAP score 
: : ; 10 
Ltrs 8 
6 
4 
2 
0 


d 

ADCYAP1R1_Epitope-3-N 
ADGRA1_Epitope-3-N 
ADGRB3_Epitope-3-N 
ADRA1A_Epitope-1-N 
AGTR1_Epitope-1-N 
AVPR1B_Epitope-3-N 
BRS3_Epitope-1-N 
C3AR1_Epitope-3 
CALCR_Epitope-2-N 
CALCRL 
CCR1_Epitope-2-N 
CCR7_Epitope-3-N 
CYSLTR2_Epitope-3-N 
EDNRA_Epitope-3-N 
GHRHR 
GHRHR_Epitope-3-N 
GPBAR1_Epitope-3-N 
GPER1_Epitope-4-N 
GPR132_Epitope-3-N 
GPR141_Epitope-4-N 
GPR146_Epitope-3-N 
GPR173_Epitope-3-N 
GPR183_Epitope-1-N 
GPR20 
GPR20_Epitope-2-N 
GPR37 

GPR50_Epitope-1-N _| 
GPR83_Epitope-3-N 
GPR87_Epitope-4-N 
GPRCS5B_Epitope-3-N 
GPRC5C_Epitope-1-N 
GPRC5D_Epitope-1-N 
GPRC6A 
GRM5 
GRPR 
HCRTR2 
HTR1E_Epitope-3-N 
LTB4R_Epitope-4-N 
MCHR2_Epitope-3-N 
MLNR_Epitope-1-N 
MRGPRE_Epitope-2-N 
MRGPRX4_Epitope-1-N 
NPSR1_Epitope-1-N 
NTSR2_Epitope-4-N 
OPN3_Epitope-3-N 
OPRM1_Epitope-3-N 
P2RY1_Epitope-2-N 
P2RY11_Epitope-4-N 
P2RY12_Epitope-3-N 
P2RY12_Epitope-4-N 
P2RY14_Epitope-3-N 
P2RY4, Epitope-1-N 
PTGDR_Epitope-3-N. 
SSTR2_Epitope-3-N 
TAARSP_Epitope-1-N 
TAAR8_Epitope-2-N 
TBXA2R_Epitope-3-N 


Extended Data Fig. 7 


A HC CVC LC Cluster 3 LC Cluster 2 LC Cluster 1 
| en || REAP score 
SARSCoV2_Alpha_S1R@O| 7 10 
SARSCOV2_ Beta S1RD| 
senso SARSCOV2_Detta_S1RBD| 8 
SSARSCOV2_ Epsilon S4RAD F 
SARSCOV2 Gamma S1RED 6 
SARSCOV2_Omieron_S1RBD i 4 
Coronavirus []] Rosen! S 
, 0 
Herpesvirus 
Respiratory 
Virus 
; Norovius. G4 2018. VPt 
Reovirus Pee oe 
ouvued SIP1A8_VPa 
Adenovrus3_ fer 
Adonovrs31_fer| 
CoxsacioB4_VP1 
= | eee 
Polovins. Vaccine VP1 
Rubela_Vaccine Spike 
WEE. £2 
Weste_Envlopoe 
B Ci. 00 EBV IgM 
VZV gE Kruskal-Wallis, p = 3.1e-13 


Kruskal-Wallis, p = 0.015 Cohort 
0.031 0.75 Control 


Hic 


0.50 Panel Category: 
Bacterial/Fungal 
Parasitic 
Tick-borne 


0.25 Viral 


seropositivity 


REAP score 


0.00 | 
Hc cc LC BRVESTEEITECERT FA USS EEL Ee Mono HC CC LC 
sos [ocd = i = o “3 @ozeesz B a 
E Sess ssssseerggGs es sess sssssaza Control 
~PS$PFSalC FSSGPE Sosa ltlPasseoypuorwrs2rs 10 
50 y-int= 98.22 Peet eres eee oh Cee) CELE Ee: 
e = GH De os 290 & 
slope = -3.245 B28 SS@PSSERMoEBSseseSs E5588 
40: : 8 5 SoS8 ES gs SMe mee Fae gO FESS 
R?= 0.9973 3g SEESSERS SE aygrgge Teesge Bs 4 Limit of Detection 
2 3 2- 3 23 Q 3 
30 698g 2 kgs OSs FEZ 2g56 8 
rad a o 2 Ba 5 CMe re zx. EET 7 = 
oO a2 Ss 2 D®@ Eg & = eal = 
20 eee a oan & seca 2S ° HC 
ge 8 E° WW s258 k Sees «e 
10 g z 2 8 23 3 z5232 6 = CC 
& & ga wu a = 
2 wu a Lc 
“40° 10' 10° 10° 10° 10° 10° 10 
Copies/ 200 uL of Input 
G R=0.73 p<2.2e-16 H HSV1 gG1, seropositive only | HSV1 gL, seropositive only J 
Kruskal-Wallis, p = 0.38 Kruskal-Wallis, p = 0.024 
4 - 
wo 
+ * 
a 
(eo) 
< 
a 
— 
i 
ise} e . . 
N e . 
[os e ° 
> 
mo 
Ww 


R=0.15, p = 0.16 


0 2 4 6 8 
EBV p23,REAP score EBV gp42 PVNFNK zscore 


0 10 20 30 40 50 


e LC Cluster 1 
- m e LC Cluster 2 
0 R= .012, p= 0.27 e LC Cluster 3 


T 
0.0 2.5 5.0 Ts 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 
EBV p23 REAP score IL-4/IL-6 CD4 T cell IL-4/IL-6 CD4 T cell IL-4/IL-6 CD4 T cell IL-4/IL-6 CD4 T cell 


Extended Data Fig. 8 


600 


age 


200 


Controls LC 


Oo 


galectin-1 (z-score) 
EBV gp42 PVXF[NtD]K (z-score) 


HC CC LC Ext.LC 


F Cortisol 


0.25 bX 


7 AUC = 0.96 
(95% DeLong Cl [0.93 - 0.99]) 


0.00 
0.25 0.50 0.75 1.00 
FPR 
Cortisol 
1.2 DHce 
occ 
Oc 
0.9 
= 
& 0.6 
ao 
0.3 
0.0 


Extended Data Fig. 9 


Sex 


Ho Male 
oo Female 


Long COVID 


Controls 


Controls LC 


Vac 


oo Unvac 
Ho Vac 


B Acute Disease 


oo Hosp. 
oo Non-Hosp. 


000 


0.0012 


HC CC 


G cDC1 


LC Ext. LC 


0.75 


0.25 


AUC = 0.95 
(95% DeLong Cl [0.91 - 0.98]) 


0.00 
0.25 0.50 0.75 1.00 
FPR 
cDC1 
GHC 
2.0 occ 
tc 
1.5 
cy 
2 1.0 
no} 
0.5 
0.0 


-2.0 -1.5 -1.0 -0.5 0.0 
log10(% of live cells) 


Total = 99 
MY-LC 


| + Dyspnea 


| - Dyspnea 


Total = 52 
Ext. LC 


H Galectin-1 


0.75 
& 0.50: 
—_ 
0.25 
AUC = 0.68 
0.0 (95% DeLong Cl [0.60 - 0.77]) 
0.25 0.50 0.75 1.00 
FPR 
Galectin-1 


density 


0) 
z-score 


> 


From live singlets 


From CD3-CD19° 


From CD3°CD19° 


From CD3°CD19° 


From CD3°CD19°CD56" 


From CD3-CD19°CD56- 


From Granulocytes CD56 CD66b" CD56 CD66bCD14-CD16 + CD66bCD14-CD16'CD304° CD66b'CD14- CD16°CD304- 
Mi je termediate 
¢ | cMlonocyle Monocyte q 
w <x q < <x 
a T cells is 3 Granulocytes ey < < DCs wo oO 
8 S biz a ui : 5 2d 
3 “4 n oi =| 6 a | ao ao 
> fs) a Oe es] a t a Oj i 
a 5 NKcells |a@ ic S| asd 8 ina ina v 
8 a 918 3a|x Qa 9 9 { 
8 a gia Flea oq < a} cDC2s 
© dp5¢- xr Y = ao x 
a CD16 
© épecb: 4 4 _ ncMonocyte 4 4 u 
= am ~ cpt eee are Ae 
CD19 BV711-A CD56 PE-Dazzle594-A CD16 BV786-A CD16 BV786-A FSC-A CD141 AF700-A CD1c AF647-A 
From live singlets From T cells From CD8* T cells From CD8* T cells From CD8*CD45RA‘ T cells From CD8*CD45RA T cells From CD8*CD45RA T cells 
T cells CD4 T cells exhaution T 
4 «td td 4 4 4 
s $ 6 3 JcD45RA" $ ; $ tm. |¢ ——— 
wo o = S x Naive ba Oo 
o co | wo ] | ae Qi QI | 
© iS ao ns ¥ y+ & 
a a a = a a 2 
2 | t a £ xy] 5] = 
fa) Qa D 6 ta Fy 
(S) ° s a 8 3 
d EI xj Activated | O 4 , ] ] Tem 1 
4 T cells CD8 T cells 4 CD45RA- j 
Foot abd r r err y r api T leg! oer magia er rer : r 
FSC-A CD8 APC-Cy7-A CD38 BV711-A CD3 BV605-A CD127 PE-Cy7-A CD127 PE-Cy7-A PD-1 PE-A 
From CD4* T cells From CD4* T cells From CD4*CD45RA‘ T cells From CD4*CD45RA T cells From CD4*CD45RA Tcells | From CD4*CD45RA’‘T cells 
z 2 heRE exhautionT | < 4 Treg x 
RY = ¢ ¢ sat |e 3 > 6 {effector Treg 
bj R JCD45RA* aq] Q 2 uy ib | : 
a tq Ry z a4 (6) a 
a < B a 2 a 7 
1 
| g | RY & 2 | 84 1 
s ; a 8 8 8 = 
<q Activated | @ ] ] a 4 tq 
CD4 T cells 4CD45RA° Tem E| Natural Treg 
perma r r TTT 7 r re oer hy rer a r r aaiealeaeil er ToT 7 7 
CD38 BV711-A CD3 BV605-A D127 PE-Cy7-A CD127 PE-Cy7-A PD-1 PE-A CD127 PE-Cy7-A CD3 BV605-A 
C From live singlets From CD8* T cells From CD8* T cells From CD8*T cells From CD8* T cells 
7 7 
ai GranzymeB* NY TNE" IFNy*TNFa* J IL-2IL-6* IL-2"IL-6* JIL-41L-6* IL-4*IL-6* 
¢ + <4 <4 
8 pe 83 iN Ny 
8S a J >] > | 
= 2 | E S S) 
a € <x Ww Ww 
9 rv 5 o oa 
< © 
. By #4 x s 
| IFNy*TNFo 7 IL-2°1L-6 | IL-4*IL-6- 
aan aan caieanaaaeiaait oer an or To ee rower ere 
FSC-A CD3 BV605-A IFNy APC-Cy7-A IL-2 PE-A IL-4 AF647-A 
From T cells From CD4* T cells From CD4* T cells From CD4* T cells From CD4* T cells 
CD4T cells j IL-7" IFNyTNEa*| IFNy*TNFa* jf IL-21L-6" | IL-2"IL-6" 4 IL-41L-6* IL-4°1L-6* 
$ $j < <4 +4 
© F is] Si iN &S 
3 3 E e1 o1 
ao | oO < uw uw 
xt i= i} a a 
8 ay ra o4 8 
i t =4 =} = = 
q 8 T cell IFNy*TNFa- ] IL-2°1L-6° ] IL-4°1L-6" 
Pree r a pre oT re rm Fp rere rower remem 
CD8 BV711-A CD3 BV605-A IFNy APC-Cy7-A IL-2 PE-A IL-4 AF647-A 
From live singlets From B cells From B cells From B cells From IgD‘CD27:B cells 
B cells Antibody secreting cells a Naive B] US memory B 
<4 4 37 Z <1 
< p < D 
8 z 5 & 8 
4 8 33 oS © 
z & B 3 a 
So a oo =| x 
q 9 6 wi 8 
8 <4 a a 34 
q 
E | = ' oq 3 
Activated B cells 2 i igh : : 
1 1 CD27: CS'memory B 4 double negative B 
Tee er i A r T r caeenell T 


CD19 BV786-A 


Extended Data Fig. 10 


CD86 PE-Cy7-A 


Ter T 
CD38 BV711-A 


7 
CD27 AF647-A 


CD38 BV711-A 


Demographics} Long COvID Healthy Site Controls Convalescent COVID-19 Controls, p-value (test statistics) 
Enrolled Participants (n) 101 a2 42 Long COVID vs Not Long COVID : 185 
Excluded Participants (n)| 2 (1.98%) 2 (4.76%) 3(7.14%) p=0.41 (OR:0.4040 [0.07569-1.779})' - : 
Cohort Size(n} 99.00 40.00 39.00 : E 178 
age (years) 45.77 £13.18 (n=99) 36.73 £10.17 (n=40) 38,23 411.67 (n=39) p<o.o001" 10,0006, 0.0040, 50.9999) 42.08 £12.87 (n=178) 
sex(M|F)] 32 | 67 (32.32% | 67.68%)(n=99) 12 | 28 (80% | 70%)(n=40) 13 | 26 (83.33% | 66.67%) (n=39) 9465 (.1101,2)"" 57 fa21 (82.02% | 67.98%) (n=178) 
mil 26.04£7.02 (n=99) 24.86 £6.78 (n=39) 24.56 £3.41 (n =38) paogol : 25.46 £6.36 (n=176) 
ace 
Asian 5 (5.05%) 4 (10%) 37.69%) : : 12(6.74%) 
Black 7 (7.07%) 1 (2.5%) 2 (5.13%) - : 10 (5.62%) 
‘American Indian /Alaskan Native 0 (0%) 1 (2.5%) 0 (0%) - - 10.56%) 
Native Hawaiian and Other Pacific Islander 1 (2.01%) 0(0%) 00%) : : 10.56%) 
White 74 (74.75%) 27(67.5%) 27 (69.23%) : : 128 (71.91%) 
Other 5 (5.05%) 7 (07.5%) 6 (15.38%) - : 18 (10.11%) 
Unknown 7 (7.07%) 00%) 010%) - > 7 (3.93%) 
Ethnicity| 
Hispanic 8 (8.08%) 13 (32.5%) 9 (23.08%) > : 30 (16.85%) 
COVID-19 Clinical Testing Positive test vs. No Positive Test 
Clinically Confirmed COVID-19 (+PCR | +A) 70 (70.71%) 33 (84.62%) 7 103 (69.13%) 
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Asthma 26 (25.74%) 2(ai76%) 49.52%) (0014 (OR: 4.333 [1.671 -10.67))" : 32 (17.68%) 
coo 2(1.98%) 0 (0%) 0 (0%) 5035 (ORin.c.)" - 2 (1.1%) 
Other Lung Dysfunction (e.g. Chronic) 10.99%) 0 (0%) 0 (0%) p=50.9999 (OR:n.c.)* - 1(0.55%) 
Stroke 1 (0.99%) 0 (0%) 0 (0%) p=50.9999 (OR:n.c.)" : 10.55%) 
Spinal Cord injury 1(0.99%) 1(2,38%) 0 (0%) =50,9999 (OR: 0.7959 [.04151-15.30})" : 2(1.1%) 
Neurological Dysfunction (e.g. Parkinson's, Epilepsy, Dementia) 1(0,99%) 0 (0%) 1(2.38%) 0.9999 (OR: 0.7959 [.04151 -15.30])' . 2 (1.1%) 
Obesity 8 (7.92%) 6 (14.29%) 0 (0%) 30.9999 (OR: 1.070 [3422-3.283))" : 14 (7.73%) 
Immunological Dysfunction (e.g. Autoimmune) 3 (2.97%) 1(2.38%) 3 (7.14%) 7014 (OR: 0.5859 [0.1446-2.244})" . 7 (3.87%) 
Concer 6 (5.94%) 0(0%) 1 (2.38%) 1342 (OR: 5.032 [0.7916 -58.44))' : 73.87%) 
Anxiety 25 (24.75%) 7 (16.67%) 10 (23.81%) {5978 (OR:1.232 [0.6171 -2.425])" - 42 (23.2%) 
Depression 1615.84%) 28.76%) 9 (21.43%) p=0.8339 (OR: 1.192 [0.5394-2.607)" : 27 (14.92%) 
Other Psychological Diagnoses 2 (1.98%) 0 (0%) 1(2.38%) p=>0,9999 (OR: 1.608 [0.1839 -23.57])' : 3 (1.66%) 
Eating Disorder 2 (1.98%) 0 (0%) 24.76%) 30,9999 (OR: 0.7938 [0.1223 -5.164]}" : 4(2.21%) 
Irritable bowel syndrome 15 (14.85%) 0 (0%) 3 (7.14%) 0129 (OR:4.524 [1.295 -15.09))" - 18 (9.94%) 
Other 0 (0%) 00%) (0%) : : (0%) 
None 22 (21.78%) 29 (69.05%) 28 (66.67%) p=<0.0001 (OR:0.1103 [0.05787 -0.2181)) " = 79 (43.65%) 


Prior Autoimmune Diagnoses (Yes | No} 
Hypothyroidism 
Crahin'sDisease 
Hyperthyroidism 

linclusion Body Myositis 
Microscopic colitis 

Pernicious Anemia 
Polymyositis 

Polyarthralgia 

Primary Biliary Cholangitis 
Rheumatoid Arthritis 

Sicca 

‘Systemic Lupus Erythematosus 
Ulcerative Colitis 

Multiple Sclerosis (remission) 


18 | 81 (18.18% | 81.82%) (a 
99.09%) 
1 (1.01%) 
1 (1.01%) 
1 (1.01%) 
1 (2.01%) 
1 (2.01%) 
1 (2.01%) 
1 (2.01%) 
1 (2.01%) 
1 (2.01%) 
1 (2.01%) 
12.01%) 
1 (2.01%) 

(0%) 


2 | 38 (5% | 95%) 
(0%) 
0(0%) 
040%) 
010%) 
0(0%) 
0(0%) 
010%) 
010%) 
010%) 

1 (3.03%) 
0(0%) 

1 (3.03%) 
010%) 
010%) 


3 | 32 (8.57% | 91.43%) ( 
1 (5.56%) 
0 (0%) 
1 (5.56%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 
1 (5.56%) 


p=.0765 (5.140, Ape 


23 | 151 (13.22% | 86.78%) (n=174) 
10 (6.67%) 
1 (0.67%) 
2 (1.33%) 
1 (0.67%) 
1 (0.67%) 
1 (0.67%) 
1 (0.67%) 
1 (0.67%) 
1 (0.67%) 
2 (2.33%) 
1 (0.67%) 
2 (2.33%) 
1 (0.67%) 
10.67%) 


Extended Data Table 1 


Cohort eqS5 eqSvas mrc fss_tot fatigue_vas phq2total gadtotal 


pain_vas prom_sleep neurogol_t pem 
HC 0 0.1 O 0.148148148 0.12 0 ~)0.095238095 0 0.3 0.140569395 0 
cc 0.1 0.13 O 0.166666667 0.1 O 0.071428571 0 0.3 0.209964413 0.1 


LC 0.3 0.35 0.3 0.833333333 0.69 0.166667 0.285714286 0.5 0.5 0.432384342 0.5 


Extended Data Table 2 


Detailed report of sensitivity and specificity 


correctly 
Cutpoint Sensitivity Specificity Classified LR+ LR. 


100.0% 0.00% $5.25% 1.0000 
100.0% 4.94% 57.46% 1.0819 0.0000 
100.00% 8.64% 59.12% 1.0946 0.0000 
100.0% 11.11% 60.22% 1.1250 0.0000 
100.00% 13.58% 61.33% 1.1571 0.0000 
(>=310) 100.0% 16.05% 62.43% 11912 0.0000 
(55352) 100.0% 17.28% 62.98% 12030 0.0000 
(>=358) 10.00% 18.52% 63.54% 12273 0.0000 
(5=358) 10.00% 19.75% 64.09% 1.2462 0.0000 
(55368) 100.0% 20.99% 64.64% 1.2656 0.0000 
(>=380) 10.00% 22.22% 65.19% 1.2857 0.0000 
(5=397) 10.00% 25.93% 66.85% 1.3500 0.0000 
(>=408) 100.0% 28.40% 67.96% 1.3966 0.0000 
(55414) 10.00% 29.63% 68.51% 1.4211 0.0000 
(55418) 10.00% 30.86% 69.06% 1.4464 0.0000 
(>=436) 10.00% 32.10% 69.61% 1.4727 0.0000 
(55437) 10.00% 33.33% 70.17% 15000 0.0000 
(2443) 10.00% 35.80% 71.27% 1.5577 0.0000 
(2446) 10.00% 37.04% 71.82% 1.5882 0.0000 
(2452) 10.00% 38.27% 72.38% 1.6200 0.0000 
(=453) 10.00% 38.51% 72.93% 1.6531 0.0000 
(52454) 10.00% 40.74% 73.48% 1.6875 0.0000 
(>=465) 10.00% 41.98% 74.03% 1.7234 0.0000 
(2467) 10.00% 43.21% 74.58% 1.7609 0.0000 
(52478) 10.00% 44.44% 75.14% 1.8000 0.0000 
(>=492) 10.00% 45.68% 75.69% 1.8409 0.0000 
(52502) 10.00% 48.15% 76.80% 1.9286 0.0000 
(52506) 10.00% 49.38% 77.35% 1.9756 0.0000 
(52514) 10.00% 50.62% 77.90% 2.0250 0.0000 
(22520) 10.00% 51.85% 78.45% 2.0769 0.0000 
(52523) 10.00% 53.09% 79.01% 2.1316 0.0000 
(52584) 10.00% 54.32% 79.56% 2.1892 0.0000 
(52576) 10.00% 55.56% 80.11% 22500 0.0000 
(52583) 10.00% 56.79% 80.66% 23143 0.0000 
(52587) 10.00% 58.02% 81.22% 23824 0.0000 
(22594) 10.00% 58.26% 81.77% 2.4545 0.0000 
(52596) 10.00% 60.49% 82.32% 2.5313 0.0000 
(22600) 10.00% 61.73% 82.87% 2.6129 0.0000 
(22604) 10.00% 62.96% 83.43% 2.7000 0.0000 
(22620) 10.00% 64.20% 83.98% 2.7931 0.0000 
(52627) 10.00% 65.43% 84.53% 2.8929 0.0000 
(52635) 10.00% 66.67% 85.08% 3.0000 0.0000 
(52637) 10.00% 67.90% 85.64% 3.1154 0.0000 
(52651) 99.00% 69.14% 85.64% 32076 0.0145 
(52653) 99.00% 70.37% 86.19% 3.3412 0.0142 
(52674) 98.00% 70.37% 85.64% 3.3075 0.0284 
(2-684) 98.00% 71.60% 86.19% 3.4513 0.0279 
(2-689) 97.00% 71.60% 85.64% 3.4161 0.0419 
(2-691) 97.00% 72.84% 86.19% 35714 0.0412 
(52694) 96.00% 72.84% 85.64% 35345 0.0549 
(52716) 96.00% 74.07% 86.19% 3.7029 0.0540 
(52728) 96.00% 76.54% 87.29% 4.0926 0.0523 
(52744) 96.00% 77.78% 87.85% 4.3200 0.0514 
(52755) 96.00% 81.48% 89.50% 5.1840 0.0491 
(52763) 96.00% 82.72% 90.06% 5.5543 O.o4g4 
(52770) 95.00% 82.72% 89.50% 5.4964 0.0604 
(52781) 94.00% 82.72% 88.95% 5.4386 0.0725 
(52795) 93.00% 82.72% 88.40% 5.3807 0.0846 
(>=840) 93.00% 83.95% 88.95% 5.7946 0.0834 
(2851) 93.00% 85.19% 89.50% 62775 0.0822 
(22855) 92.00% 85.19% 88.95% 6.2100 0.0939 
(>=868) 91.00% 85.19% 88.40% 6.1425 0.1057 

91.00% 86.42% 88.95% 6.7009 0.1041, 
91.00% 87.65% 89.50% 7.3710 0.1027 
90.00% 87.65% 88.95% 7.2900 0.1141 
90.00% 88.89% 89.50% 8.1000 0.1125, 
89.00% 88.89% 88.95% 0100 0.1237 
89.00% 90.12% 89.50% 9.0113 0.1221 
88.00% 90.12% 8.95% 9100 0.1332 
87.00% 90.12% 88.40% ag0se 0.1442 
26.00% 90.12% 87.85% 8.7075 0.1553, 
26.00% 91.36% 88.40% 9.9514 0.1532 
85.00% 91.36% 87.85% 9.8357 0.1642 
24.00% 91.36% 87.29% 9.7200 0.1751 
83.00% 91.36% 86.74% 9.6043 0.1861 
82.00% 91.36% 86.19% 9.4886 0.1970 
81.00% 91.36% 85.64% 9.3729 0.2080 
20.00% 91.36% 85.08% 9.2571 0.2189 
79.00% 91.36% 24.53% 9.1414 0.2299 
78.00% 91.36% 83.98% 9.0257 0.2408, 
77.00% 91.36% 83.43% 9100 0.2518, 
76.00% 91.36% 82.87% 8.7943 0.2627 
75.00% 91.36% 82.32% 8.6786 0.2736 
75.00% 92.59% 82.87% 10.1250 0.2700 
74.00% 92.59% 82.32% 9.9900 0.2808, 
73.00% 92.59% 81.77% 9.8550 0.2916 
73.00% 93.83% 82.32% 11.8260 0.2878 
72.00% 93.83% 81.77% 11.6640 0.2984 
72.00% 95.06% 82.32% 14.5800 0.2945 
71.00% 95.06% 81.77% 14.3775 0.3081 
70.00% 95.06% 81.22% 14.1750 0.3156 
69.00% 95.06% 0.66% 13.9725 0.3261 
68.00% 95.06% 80.11% 13.7700 0.3366 
66.00% 95.06% 79.01% 13.3650 0.3577 
66.00% 96.30% 79.56% 17.8200 0.3531 
65.00% 96.30% 79.01% 17.5500 0.3635 
64.00% 96.30% 78.45% 172800 0.3738 
(51357) 64.00% 97.53% 79.01% 25.9200 0.3681 
(551415) 63.00% 97.53% 78.45% 255150 0.3794 
(551430) 62.00% 97.53% 77.90% 25.1100 0.3896 

61.00% 97.53% 77.35% 24.7050 0.3999 
60.00% 97.53% 76.80% 24.3000 0.4101 
59.00% 97.53% 76.24% 23.8950 0.4204 
58.00% 97.53% 75.69% 23.4900 0.4306 
57.00% 97.53% 75.14% 23.0850 0.4409 
36.00% 97.53% 74.59% 22.6800 0.4511 
55.00% 97.53% 74.03% 222750 0.4614 
54.00% 97.53% 73.48% 21.8700 0.4716 
53.00% 97.53% 72.93% 21.4650 0.4819 
52.00% 97.53% 72.38% 21.0600 0.4922 
51.00% 97.53% 71.82% 20.6550 0.5024 
50.00% 97.53% 71.27% 20.2500 0.5127 
49.00% 97.53% 70.72% 19.8450 0.5229 
48.00% 97.53% 70.17% 19.4400 0.5332 
(51588) 48.00% 98.77% 70.72% 38.8801 0.5265 
(551591) 47.00% 98.77% 70.17% 38.0701 0.5366 
(51594) 46.00% 98.77% 69.61% 37.2601 0.5467 

45.00% 98.77% 69.06% 36.4501 0.556 
48.00% 98.77% 68.51% 35.6401 05670 
43.00% 98.77% 67.96% 34.8301 (0.5771 
42.00% 98.77% 67.40% 34.0201 0.5872 
40.00% 98.77% 66.30% 32.4001 0.6075 
39.00% 98.77% 65.75% 315901 0.6176 
38.00% 98.77% 65.19%30.7801 0.6277 
37.00% 98.77% 64.64% 29.9701 0.6379 


36.00% 100.0% 64.64% 0.6400 
35.00% 100.00% (64.09% 0.6500 
34.00% 100.00% 63.54% 0.6600 
33.00% 10.00% 62.98% 0.6700 
32.00% 10.00% 62.43% 0.6800 
31.00% 10.00% 61.88% 0.6900) 
29.00% 100.00% 60.77% 0.7100 
28.00% )100.00% 60.22% 07200 
27.00% 100,00% 59.67% 0.7300 
25.00% 100,00% s8.s6% 0.7500 
24.00% 10.00%) $8.01% 0.7600, 
23.00%. 100.0% 1$7.46% 0.7700 
(21766) 22.00% 7 100.00% 56.91% 0.7800 
(=1780) 21.00% 100.0% 56.35% 0.7900 
(5=1782) 20.00% 100.00%. §5.80% 0.8000 
c 18.00% 10.00% 55.25% 8100 
c 48.00% 100,00% 54.70% 0.8200 
c 17.00% (100.0% 54.14% 0.8300 
c 15.00% 10.00% 53.04% 0.8500 
c 14.00% 10.00% | $2.49% 0.8600 
c 13.00% 10.00% $1.93% 0.8700 
(551819) 12.00% 10.00% 51.38% 0.8800 
(51843) 11.00% 100.0% 50.83% 0.2900 
(>=1859) 10.00% 100.0% 50.28% 0.9000 
(221870) 9.00% 10.00% 49.72% 0.9100 
(551878) 7.00% 10.00% 48.62% 0.9300 
6.00% 10.00% 48.07% 0.9400 
5.00% 10.00% 47.51% 0.9500 
4.00% 10.00% 46.96% 0.9600 
c 3.00% 10.00% 46.41% 0.9700 
c 2.00% 10.00% 45.86% 0.9800 
c 1.00% 10.00% 45.30% 0.9900 
(> 1963) 0.00% 100.00% 44.75% 1.0000 


Roc Binomial Exact ~ 
Obs Area Std.Err. [95% Conf. Interval] 


181 0.9542 0.0144 0.91477 0.98073 


Extended Data Table 3 


Anti-S 
Deviance Residuals: 
Min 1Q Median 3Q Max 
-6.0875 0.359 0.1374 0.5489 3.6782 


Coefficients: 
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 6.355241 0.712105 8.925 6.33E-16 *** 
age 0.003949 0.007573 0.521 0.60272 
sex 0.003601 0.198135 0.018 0.98552 
LC_Status 0.681862 0.230898 2.953 0.00358 ** 
VAD 1.525492 0.128148 11.904 < 2.00E-16 *** 
BMI 0.025414 0.014638 1.736 0.08433 . 
Anti-S1 
Deviance Residuals: 
Min 1Q Median 3Q Max 


6.1029 -0.6179 0.0336 0.794 3.7732 


Coefficients: 
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 4.754091 0.808882 5.877 2.09E-08 *** 
age 0.000207 0.008602 0.024 0.9808 
sex -0.05942 0.225062 -0.264 0.7921 
Lc_Status 1.425864 0.262278 5.436 1.836-07 *** 
VAD 1.903577 0.145563 13.077 < 2.00E-16 *** 
BMI 0.040236 0.016628 2.42 0.0166 * 
Anti-RBD 
Residuals: 
Min 1Q Median 3Q Max 
6.0942 -0.6904 0.133 0.82 5.5916 
Coefficients: 
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 4.702146 0.897883 5.237 4.70€-07 *** 
age 0.007358 0.009549 0.771 0.442 
sex 0.059682 0.249826 0.239 0.8115 
Lc_Status 0.742004 0.291136 2.549 0.0117 * 
VAD 2.182805 0.161579 13.509 < 2.00E-16 *** 
BMI 0.039673 0.018457 2.149 0.033 * 
Seropositiv Spike Motif KFLPFQQ 
Residuals: 
Min 1Q Median 3Q Max 
-14.263 6.93 -3.938 0.66 55.635 
Coefficients: 
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 0.37259 7.96592 0.047 0.96275 
age -0.07079 0.08471 -0.836 0.40452 
sex -2.26258 2.21642 -1.021 0.30876 
Lc_Status 8.16153 2.58293 3.16 0.00186 ** 
VAD 2.61206 1.43352 1.822 0.07016 . 
BMI 0.15053 0.16375 0.919 0.35924 
Seropositiv Spike Motif LDKWYF 
Residuals: 
Min 1Q Median 3Q Max 
-24.663  -12.466 — 6.899 5.72 102,332 
Coefficients: 
Estimate Std. Error t value Pr(>|t|) 
(Intercept) -0.54018 11.641 -0.046 0.963 
age 0.04612 0.1238 _-0.373 0.7099 
sex “3.58417 3.23897 -1.107 0.27 
Lc_Status 6.0783 3.77456 1.61 0:1091 
VAD 3.36872 2.09487 1.608 0.1096 
BMI 0.52293 0.2393 2.185 0.0302 * 
Seropositiv Spike Motif. RDPQTLE 
Residuals: 
Min 1Q Median 3Q Max 


8.71 -4.904 -2.189 0.169 94.84 


Coefficients: 

Estimate . Std. Error t value Pr(>|t|) 
(Intercept) -3.84559 6.872183 0.56 0.57648 
age 0.008607 0.073083 0.118 0.90639 
sex 1.20714 1.912105 -0.631 0.52867 
LC_Status 6.383147 2.228285 2.865 0.00469 ** 
VAD 2.028998 1.236691 1.641 0.10268 
BMI 0.073859 0.141268 0.523 0.60176 
Seropositiv Spike Motif DISGI 
Residuals: 

Min 1Q Median 3Q Max 


-12.264 6.206 4.179 -1.252 110.718 


Coefficients: 
Estimate Std. Error t value Pr(>|t|) 

(Intercept) -7.55943 9.121243 -0.829 0.4084 

age -0.00614 0.097001 0.063 0.9496 

sex 2.036719 2.53788 0.803 0.4233 

Lc_Status 5.940301 2.957536 2.009 0.0461 * 

VAD 1.630025 1.641423 0.993 0.3221 

BMI 0.160742 0.187501 0.857 0.3925 


Extended Data Table 5 


Generalized linear regression model 
zscore_Cortisol ~1+x0_Demographics_ Age + x0_Demographics_Sex +x0_Demographics_BMI +x0_Sample_Time_Min + x0_LC_Status +x0_Study_Cohort 
Distribution = Normal 


Estimated Coefficients: 


Estimate SE tStat pValue 

(Intercept) 1.6342 0.32856 4.9737 1.33E-06 
x0_Demographics_Age -0.00847 0.002902 -2.9186 0.003882 
x0_Demographics_Sex_2 -0.02619 0.081514 -0.32131 0.74828 
x0_Demographics_BMI -0.00774 0.005611 -1.3798 0.16905 
x0_Sample_Time_Min -0.00065 0.000431 -1.5003 0.13498 
x0_LC_Status_1 -1.2198 0.089809 -13.582. 6.92E-31 
x0_Study_Cohort_2 0.68657 0.1 6.8658 6.73E-11 


226 observations, 219 error degrees of freedom 
Estimated Dispersion: 0.304 
F-statistic vs. constant model: 44.1, p-value =4.04E-35 


Extended Data Table 6 


kappa kappa_lower_ci kappa_upperaci 
0.786407767 0.645964174 0.92685136 


Extended Data Table 7 
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Statistics 
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


— A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


— For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection All participant survey data were collected and securely stored using REDCap 13.4 (Research Electronic Data Capture) electronic data capture 
tools hosted within the Mount Sinai Health System. All other de-identified research data were stored securely in password protected internal 
electronic repositories. All Flow Cytometry data was collected and analyzed using FlowJo software version 10.8 software (BD). 


Data analysis All data analysis was performed using MATLAB (2023b), R, and GraphPad Prism (9.8.1). A repository of computer code used for analysis can be 
found at: https://github.com/rahuldhodapkar/puddlr 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and 
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- Adescription of any restrictions on data availability 


- For clinical datasets or third party data, please ensure that the statement adheres to our policy 


All research data for study participants used in this manuscript are included in Supplementary Table 3. All of the raw fcs files for the flow cytometry analysis are 
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available at the FlowRepository platform (http://flowrepository.org/) under Repository ID: FR-FCM-Z6KL. Accession numbers for protein structure are used UniProt 
and are as follows: trimeric Spike (PDB: 6VXX) and EBV gH/gL (PDB: 5T1D). 


Human research participants 


Policy information about studies involving human research participants and Sex and Gender in Research. 


Reporting on sex and gender Sex was determined through self-report and review of electronic medical records. No sex disaggregated analysis was 
performed. Study demographics, including proportion sex by individual study group, are included in Extended Table 1. 


Population characteristics All relevant population demographics are described in Extended Table 1. 


Recruitment Participants with persistent symptoms following acute COVID-19 were recruited from Long COVID clinics within the Mount 
Sinai Healthcare System and the Center for Post COVID Care at Mount Sinai Hospital. Participants enrolled in healthy and 
convalescent study arms were recruited via IRB-approved advertisements delivered through email lists, study flyers located in 
hospital public spaces, and on social media platforms. Informed consent was provided by all participants at the time of 
enrollment. Individuals in the external Long COVID group (“Ext. LC’) were identified from The Winchester Center for Lung 
Disease's Post-COVID-19 Recovery Program at Yale New Haven Hospital by collaborating clinicians. Recruitment from 
treatment clinics predisposes this study to a degree of self-selection bias among participants, which was accounted for 
through demographic matching procedures. 


Ethics oversight This study was approved by the Mount Sinai Program for the Protection of Human Subjects (IRB #20-01758) and Yale 
Institutional Review Board (IRB #2000029451 for MY-LC; IRB #2000028924 for enrollment of pre-vaccinated Healthy 
Controls; HIC #2000026109 for External Long COVID). Informed consent was obtained from all enrolled participants. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


4 Life sciences [| Behavioural & social sciences [| Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample size was not predetermined prior to enrollment of study participants. Samples sizes were chosen based on prior experience with 
multiplexed immune phenotyping assays and available study resources. 


Data exclusions | Data exclusions are stated explicitly in Methods under the heading "MY-LC Study Design, Enrollment Strategy, and Inclusion / Exclusion 
Criteria " and are reproduced here for convenience: "Inclusion criteria for individuals in the Long COVID group (“LC”) were age > 18 years; 
previous confirmed or probable COVID-19 infection (according to World Health Organization guidelines1); and persistent symptoms > 6 weeks 
following initial COVID-19 infection. Inclusion criteria for enrollment of individuals in the healthy control group (“HC”) were age > 18 years, no 
prior COVID-19 infection, and completion of a brief, semi-structured verbal screening with research staff confirming no active 
symptomatology. Inclusion criteria for individuals in the convalescent control group (“CC”) were age = 18 years; previous confirmed or 
probable prior COVID-19 infection; and completion of a brief, semi-structured verbal screening with research staff confirming no active 
symptomatology. 


Pre-specified exclusion criteria for this study were inability to provide informed consent; and any condition preventing a blood test from being 
performed. Additionally, all participants had electronic health records reviewed by study clinicians following enrollment and were 
subsequently excluded prior to analyses for the following reasons: (1) current pregnancy, (2) immunosuppression equivalent to or exceeding 
prednisone 5 mg daily, (3) active malignancy or chemotherapy, and (4) any monogenic disorders. For specific immunological analyses, pre- 
existing medical conditions were additionally excluded prior to analyses due to high potential for confounding (e.g., participants with 
hypothyroidism were excluded prior to analysis of circulating T3/T4 levels; participants with pituitary adenomas were excluded prior to 
analysis of cortisol levels). Specific exclusions are marked by “A” in figures and detailed in relevant legends." 


Replication Each participant plasma and PBMC sample was partitioned into aliquots for use in various assays. Technical replicates were performed on 
patient samples where sample volume limitations permitted. When performed (e.g. ELISA, qPCR), technical replicates were successful. 


Randomization | Randomization was not applicable to this study as it is a cross-sectional, observational human research study of a pre-existing medical 
condition. 


Blinding Blinding of study investigators was not performed due to pre-existing intrinsic knowledge of clinical condition / study groups by both 
participants and investigators, as well as necessary logistical accommodations for scheduling of sample draws by study participants. 
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Behavioural & social sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description 


Research sample 


Sampling strategy 


Data collection 


Timing 


Data exclusions 


Non-participation 


Randomization 


Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g. qualitative cross-sectional, 
quantitative experimental, mixed-methods case study). 


State the research sample (e.g. Harvard university undergraduates, villagers in rural India) and provide relevant demographic 
information (e.g. age, sex) and indicate whether the sample is representative. Provide a rationale for the study sample chosen. For 
studies involving existing datasets, please describe the dataset and source. 


Describe the sampling procedure (e.g. random, snowball, stratified, convenience). Describe the statistical methods that were used to 
predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a 
rationale for why these sample sizes are sufficient. For qualitative data, please indicate whether data saturation was considered, and 
what criteria were used to decide that no further sampling was needed. 


Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, 
computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and 
whether the researcher was blind to experimental condition and/or the study hypothesis during data collection. 


Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each sample 
cohort. 


If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the 
rationale behind them, indicating whether exclusion criteria were pre-established. 


State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no 
participants dropped out/declined participation. 


If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if 
allocation was not random, describe how covariates were controlled. 


Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description 


Research sample 


Sampling strategy 


Data collection 


Timing and spatial scale 


Data exclusions 


Reproducibility 


Randomization 


Blinding 


Briefly describe the study. For quantitative data include treatment factors and interactions, design structure (e.g. factorial, nested, 
hierarchical), nature and number of experimental units and replicates. 


Describe the research sample (e.g. a group of tagged Passer domesticus, all Stenocereus thurberi within Organ Pipe Cactus National 
Monument), and provide a rationale for the sample choice. When relevant, describe the organism taxa, source, sex, age range and 
any manipulations. State what population the sample is meant to represent when applicable. For studies involving existing datasets, 
describe the data and its source. 


Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size 
calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient. 


Describe the data collection procedure, including who recorded the data and how. 
Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for 
these choices. If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which 


the data are taken 


If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, 
indicating whether exclusion criteria were pre-established. 


Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to 
repeat the experiment failed OR state that all attempts to repeat the experiment were successful. 


Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were 
controlled. If this is not relevant to your study, explain why. 


Describe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR explain why 
blinding was not relevant to your study. 


Did the study involve field work? Yes No 
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Field work, collection and transport 


Field conditions Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall). 


Location State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water depth). 
Access & import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in 
compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, 


the date of issue, and any identifying information). 


Disturbance Describe any disturbance caused by the study and how it was minimized. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology and archaeology MRI-based neuroimaging 


Clinical data 


Dual use research of concern 


| 
| 
Animals and other organisms 
| 
| 


Antibodies 


Antibodies used All antibodies, dilutions, and catalog numbers are used in this manuscript are detailed in Supplementary Table 1. 


Validation All antibodies used in this study are commercially available, and all have been validated by the manufacturers and used by other 
publications. Likewise, we titrated these antibodies according to our own our staining conditions. The following were validated in the 
following species: BB515 anti-hHLA-DR (G46-6) (BD Biosciences) (Human, Rhesus, Cynomolgus, Baboon), BV785 anti-hCD16 (3G8) 
(BioLegend) (Human, African Green, Baboon, Capuchin Monkey, Chimpanzee, Cynomolgus, Marmoset, Pigtailed Macaque, Rhesus, 
Sooty Mangabey, Squirrel Monkey), PE-Cy7 anti-hCD14 (HCD14) (BioLegend) (Human), BV605 anti-hCD3 (UCHT1) (BioLegend) 

(Human, Chimpanzee), BV711 anti-hCD19 (SJ25C1) (BD Biosciences) (Human), AlexaFluor647 anti-hCD1c (L161) (BioLegend) (Human, 

African Green, Baboon, Cynomolgus, Rhesus), Biotin anti-hCD141 (M80) (BioLegend) (Human, African Green, Baboon), PE-Dazzle594 

anti-hCD56 (HCD56) (BioLegend) (Human, African Green, Baboon, Cynomolgus, Rhesus), PE anti-hCD304 (12C2) (BioLegend) 

(Human), APCFire750 anti-hCD11b (ICRF44) (BioLegend) (Human, African Green, Baboon, Chimpanzee, Common Marmoset, 

Cynomolgus, Rhesus, Swine), PerCP/Cy5.5 anti-hCD66b (G10F5) (BD Biosciences) (Human), BV421 anti-CD15 (W6D3) (BioLegend) 

(Human), BV785 anti-hCD4 (SK3) (BioLegend) (Human), APCFire750 or BV711 anti-hCD8 (SK1) (BioLegend) (Human, Cross-Reactivity: 

African Green, Chimpanzee, Cynomolgus, Pigtailed Macaque, Rhesus, Sooty Mangabey), BV421 anti-hCCR7 (G043H7) (BioLegend) 

(Human, African Green, Baboon, Cynomolgus, Rhesus), AlexaFluor 700 anti-hCD45RA (HI100) (BD Biosciences) (Human), PE anti-hPD1 

(EH12.2H7) (BioLegend) (Human, African Green, Baboon, Chimpanzee, Common Marmoset, Cynomolgus, Rhesus, Squirrel Monkey), 

APC antihTIM3 (F38-2E2) (BioLegend) (Human), BV711 anti-hCD38 (HIT2) (BioLegend) (Human, Chimpanzee, Horse), BB700 anti- 

hCXCR5 (RF8B2) (BD Biosciences) (Human), PE-Cy7 anti-hCD127 (HIL-7R-M21) (BioLegend) (Human), PE-CF594 anti-hCD25 (BC96) (BD 

Biosciences) (Human, Rhesus, Cynomolgus, Baboon), BV421 anti-hlL-17a (N49-653) (BD Biosciences) (Human), AlexaFluor 700 anti- 

hTNFa (MAb11) (BioLegend) (Human, Cat, Cross-Reactivity: Chimpanzee, Baboon, Cynomolgus, Rhesus, Pigtailed Macaque, Sooty 

Mangabey, Swine), APC/Fire750 anti-hlFNy (4S.B3) (BioLegend) (Human, Cross-Reactivity: Chimpanzee, Baboon, Cynomolgus, 

Rhesus), FITC anti-hGranzymeB (GB11) (BioLegend) (Human, Mouse, Cross-Reactivity: Rat), AlexaFluor 647 anti-hlL-4 (8D4-8) 

(BioLegend) (Human, Cross-Reactivity: Chimpanzee, Baboon, Cynomolgus, Rhesus), BB700 anti-hCD183/CXCR3 (1C6/CXCR3) (BD 

Biosciences) (Human, Rhesus, Cynomolgus, Baboon), PE-Cy7 anti-IL-6 (MQ2-13A5) (BioLegend) (Human), PE anti-hlL-2 (5344.111) (BD 

Biosciences) (Human), BV785 anti-hCD19 (SJ25C1) (BioLegend) (Human), BV421 anti-hCD138 (MI15) (BioLegend) (Human), 

AlexaFluor700 anti-hCD20 (2H7) (BioLegend) (Human, Baboon, Capuchin Monkey, Chimpanzee, Cynomolgus, Pigtailed Macaque, 

Rhesus, Squirrel Monkey), AlexaFluor 647 anti-hCD27 (M-1T271) (BioLegend) (Human, Cross-Reacitivity: Baboon, Cynomolgus, 

Rhesus), PE/Dazzle594 anti-higD (IA6-2) (BioLegend) (Human), PE-Cy7 anti-hCD86 (IT2.2) (BioLegend) (Human, African Green, 

Baboon, Capuchin Monkey, Common Marmoset, Cotton-topped Tamarin, Chimpanzee, Cynomolgus, Rhesus), APC/Fire750 anti-hlgM 

(MHM-88) (BioLegend) (Human, African Green, Baboon, Cynomolgus, Rhesus), BV605 anti-hCD24 (MLS) (BioLegend) (Human, Cross- 

Reactivity: Chimpanzee), AlexaFluor 700 Streptavidin (1:300) (ThermoFisher). 
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Eukaryotic cell lines 


Policy information about cell lines and Sex and Gender in Research 


Cell line source(s) State the source of each cell line used and the sex of all primary cell lines and cells derived from human participants or 
vertebrate models. 


Authentication Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated. 


Mycoplasma contamination Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for 
mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination. 


Commonly misidentified lines ame any commonly misidentified cell lines used in the study and provide a rationale for their use. 
(See ICLAC register) 
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Palaeontology and Archaeology 


Specimen provenance Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the 
issuing authority, the date of issue, and any identifying information). Permits should encompass collection and, where applicable, 


export. 

Specimen deposition Indicate where the specimens have been deposited to permit free access by other researchers. 

Dating methods If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), where 
they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new dates are 
provided. 


Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information. 


Ethics oversight Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance 
was required and explain why not. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Animals and other research organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research, and Sex and Gender in 
Research 


Laboratory animals For laboratory animals, report species, strain and age OR state that the study did not involve laboratory animals. 


Wild animals Provide details on animals observed in or captured in the field; report species and age where possible. Describe how animals were 
caught and transported and what happened to captive animals after the study (if killed, explain why and describe method; if released, 
say where and when) OR state that the study did not involve wild animals. 


Reporting on sex Indicate if findings apply to only one sex; describe whether sex was considered in study design, methods used for assigning sex. 
Provide data disaggregated for sex where this information has been collected in the source data as appropriate; provide overall 
numbers in this Reporting Summary. Please state if this information has not been collected. Report sex-based analyses where 
performed, justify reasons for lack of sex-based analysis. 


Field-collected samples | For laboratory work with field-collected samples, describe all relevant parameters such as housing, maintenance, temperature, 
photoperiod and end-of-experiment protocol OR state that the study did not involve samples collected from the field. 


Ethics oversight Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance 
was required and explain why not. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Clinical data 


Policy information about clinical studies 
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions. 


Clinical trial registration | Provide the trial registration number from ClinicalTrials.gov or an equivalent agency. 
Study protocol Note where the full trial protocol can be accessed OR if not available, explain why. 


Data collection Describe the settings and locales of data collection, noting the time periods of recruitment and data collection. 


Outcomes Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures. 


Dual use research of concern 


Policy information about dual use research of concern 


Hazards 


Could the accidental, deliberate or reckless misuse of agents or technologies generated in the work, or the application of information presented 
in the manuscript, pose a threat to: 


No | Yes 


Public health 


National security 


Crops and/or livestock 
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| Ecosystems 


|] Any other significant area 


Experiments of concern 
Does the work involve any of these experiments of concern: 


No | Yes 


Demonstrate how to render a vaccine ineffective 


Confer resistance to therapeutically useful antibiotics or antiviral agents 


Enhance the virulence of a pathogen or render a nonpathogen virulent 


Increase transmissibility of a pathogen 


Alter the host range of a pathogen 


Enable evasion of diagnostic/detection modalities 


Enable the weaponization of a biological agent or toxin 


Any other potentially harmful combination of experiments and agents 


ChIP-seq 


Data deposition 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


|__| Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, 
May remain private before publication. | provide a link to the deposited data. 


Files in database submission Provide a list of all files available in the database submission. 
Genome browser session Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to 
(e.g. UCSC) 


enable peer review. Write "no longer applicable" for "Final submission" documents. 


Methodology 
Replicates Describe the experimental replicates, specifying number, type and replicate agreement. 
Sequencing depth Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and 
whether they were paired- or single-end. 
Antibodies Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot 


number. 


Peak calling parameters Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files 


used. 
Data quality Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment. 
Software Describe the software used to collect and analyze the ChiP-seq data. For custom code that has been deposited into a community 


repository, provide accession details. 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation Freshly isolated PBMCs were plated at 1-2 x 106 cells per well in a 96-well U-bottom plate. Cells were resuspended in Live/ 
Dead Fixable Aqua (ThermoFisher) for 20 min at 4°C. Cells were washed with PBS and followed by Human TruStain FcX 
(BioLegend) incubation for 10 min at RT. Cocktails of staining antibodies were added directly to this mixture for 30 minutes at 
RT. Prior to analysis, cells were washed and resuspended in 100 ul 4% PFA for 30 min at 4°C. For intracellular cytokine 
staining following stimulation, the surface marker-stained cells were resuspended in 200 ul cRPMI (RPMI-1640 supplemented 
with 10% FBS, 2 mM L-glutamine, 100 U/ml penicillin, and 100 mg/ml streptomycin, 1 mM sodium pyruvate) and stored at 4° 
C overnight. Subsequently, these cells were washed and stimulated with 1x Cell Stimulation Cocktail (eBioscience) in 200 ul 
cRPMI for 1 h at 37°C. Fifty ul of 5x Stimulation Cocktail in CRPMI (plus protein transport 442 inhibitor, eBioscience) was 
added for an additional 4 hours of incubation at 37°C. Following stimulation, cells were washed and resuspended in 100 ul 
4% paraformaldehyde for 30 min at 4°C. To quantify intracellular cytokines, cells were permeabilized with 1x 
permeabilization buffer from the FOXP3/Transcription Factor Staining Buffer Set (eBioscience) for 10 min at 4°C. All 
subsequent staining cocktails were made in this buffer. Permeabilized cells were then washed and resuspended in a cocktail 
containing Human TruStain FcX (BioLegend) for 10 min at 4°C. Finally, intracellular staining cocktails were added directly to 
each sample for 1h at 4°C. Following this incubation, cells were washed and prepared for analysis on an Attune NXT 
(ThermoFisher). 
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Instrument Attune NXT (ThermoFisher) 
Software Data were analyzed using FlowJo software version 10.8 software (BD). 
Cell population abundance No sorting of PBMC fractions was performed in this study. 


Gating strategy Gating Strategy is described in Extended Figure S10 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 


Magnetic resonance imaging 


Experimental design 


Design type Indicate task or resting state; event-related or block design. 


Design specifications Specify the number of blocks, trials or experimental units per session and/or subject, and specify the length of each trial 
or block (if trials are blocked) and interval between trials. 


Behavioral performance measures State number and/or type of variables recorded (e.g. correct button press, response time) and what statistics were used 
to establish that the subjects were performing the task as expected (e.g. mean, range, and/or standard deviation across 


subjects). 
Acquisition 

Imaging type(s) Specify: functional, structural, diffusion, perfusion. 

Field strength Specify in Tesla 

Sequence & imaging parameters Specify the pulse sequence type (gradient echo, spin echo, etc.), imaging type (EPI, spiral, etc.), field of view, matrix size, 
slice thickness, orientation and TE/TR/flip angle. 

Area of acquisition State whether a whole brain scan was used OR define the area of acquisition, describing how the region was determined. 

Diffusion MRI Used Not used 


Preprocessing 


Preprocessing software Provide detail on software version and revision number and on specific parameters (model/functions, brain extraction, 


Preprocessing software segmentation, smoothing kernel size, etc.). 


Normalization If data were normalized/standardized, describe the approach(es): specify linear or non-linear and define image types used for 
transformation OR indicate that data were not normalized and explain rationale for lack of normalization. 


Normalization template Describe the template used for normalization/transformation, specifying subject space or group standardized space (e.g. 
original Talairach, MNI305, ICBM152) OR indicate that the data were not normalized. 


Noise and artifact removal Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and 
physiological signals (heart rate, respiration). 


Volume censoring Define your software and/or method and criteria for volume censoring, and state the extent of such censoring. 


Statistical modeling & inference 


Model type and settings Specify type (mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first and 
second levels (e.g. fixed, random or mixed effects; drift or auto-correlation). 
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Effect(s) tested Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether 
ANOVA or factorial designs were used. 


Specify type of analysis: Whole brain ROI-based Both 


Statistic type for inference Specify voxel-wise or cluster-wise and report all relevant parameters for cluster-wise methods. 
(See Eklund et al. 2016) 


Correction Describe the type of correction and how it is obtained for multiple comparisons (e.g. FWE, FDR, permutation or Monte Carlo). 


Models & analysis 


n/a | Involved in the study 


Functional and/or effective connectivity 


|_| Graph analysis 


Multivariate modeling or predictive analysis 
Functional and/or effective con nectivity Report the measures of dependence used and the model details (e.g. Pearson correlation, partial correlation, 
mutual information). 


Graph analysis Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph, 
subject- or group-level, and the global and/or node summaries used (e.g. clustering coefficient, efficiency, 
etc.). 


Multivariate modeling and predictive analysis | Specify independent variables, features extraction and dimension reduction, model, training and evaluation 
metrics. 


